Machine learning via dynamical processes on complex networks

Detalhes bibliográficos
Ano de defesa: 2013
Autor(a) principal: Cupertino, Thiago Henrique
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25032014-154520/
Resumo: Extracting useful knowledge from data sets is a key concept in modern information systems. Consequently, the need of efficient techniques to extract the desired knowledge has been growing over time. Machine learning is a research field dedicated to the development of techniques capable of enabling a machine to \"learn\" from data. Many techniques have been proposed so far, but there are still issues to be unveiled specially in interdisciplinary research. In this thesis, we explore the advantages of network data representation to develop machine learning techniques based on dynamical processes on networks. The network representation unifies the structure, dynamics and functions of the system it represents, and thus is capable of capturing the spatial, topological and functional relations of the data sets under analysis. We develop network-based techniques for the three machine learning paradigms: supervised, semi-supervised and unsupervised. The random walk dynamical process is used to characterize the access of unlabeled data to data classes, configuring a new heuristic we call ease of access in the supervised paradigm. We also propose a classification technique which combines the high-level view of the data, via network topological characterization, and the low-level relations, via similarity measures, in a general framework. Still in the supervised setting, the modularity and Katz centrality network measures are applied to classify multiple observation sets, and an evolving network construction method is applied to the dimensionality reduction problem. The semi-supervised paradigm is covered by extending the ease of access heuristic to the cases in which just a few labeled data samples and many unlabeled samples are available. A semi-supervised technique based on interacting forces is also proposed, for which we provide parameter heuristics and stability analysis via a Lyapunov function. Finally, an unsupervised network-based technique uses the concepts of pinning control and consensus time from dynamical processes to derive a similarity measure used to cluster data. The data is represented by a connected and sparse network in which nodes are dynamical elements. Simulations on benchmark data sets and comparisons to well-known machine learning techniques are provided for all proposed techniques. Advantages of network data representation and dynamical processes for machine learning are highlighted in all cases