Amostragem para grandes volumes de dados: uma aplicação em redes complexas

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: Souza, Roberta Carneiro de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Rio de Janeiro
Brasil
Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia
Programa de Pós-Graduação em Engenharia Civil
UFRJ
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/11422/11622
Resumo: The main objective of this work is to implement and to evaluate options of sampling plans of algorithms for calculation of betweenness centrality, a measure used to identify important and influential vertices in complex networks aiming to improve the quality of the estimates. For statistical evaluation of variability of the estimates, indicators used in sampling, but not yet in data mining in complex networks, will be proposed. The techniques used in combination to reach the objectives and propose a new algorithm were: sampling, clustering (or community detection) and parallel computing. The sampling feature has been widely used as a tool to reduce dimensionality in data mining problems to streamline processes and reduce costs with data storage. The techniques of grouping for the detection of communities have a high correlation with the measure to be estimated, the betweenness centrality. One of the factors used in choosing the methods used in the implementation of the algorithms was the possibility of using parallel or distributed computing. After the review of the literature and evaluation of the results of the experiments carried out, it is concluded that the proposed algorithm contributes to the state of the art of the use of sampling to estimate betweenness centrality in large complex networks, a challenge in the current scenario of big data, by adding several techniques that optimize the extraction of data knowledge. The proposed algorithm, in addition to improving the quality of the estimates, presented a reduction in the processing time while keeping the scalability.