Characterization and identification of synonyms on anonymous social networks

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Gomes, Janaína Sant’Anna Gomide
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal do Rio de Janeiro
Brasil
Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia
Programa de Pós-Graduação em Engenharia de Sistemas e Computação
UFRJ
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/11422/13547
Resumo: In many scenarios objects are referred to using multiple labels and this diversity leads to ambiguities. Addressing name ambiguity is an important step in data consolidation and with the growth in the amount of digital data has become even more pressing. Moreover, the growing privacy concerns among individuals and enterprises is leading to the removal of personally identifiable information (PII) in data that is publicly available. In this work, we focus on the characterization and identification of synonyms in anonymous social networks where only the network structure is considered, all PII has been discarded. The main contributions of this thesis are to classify name usage patterns by individuals that use multiple names, to propose a probabilistic model for synonyms in social networks, and to propose algorithms to identify synonyms in anonymous social networks. The first algorithm considers distance between nodes and number of common neighbors to identify synonyms in a social network. The second algorithm considers ego-centered collaboration networks and identifies the different nodes that correspond to the egonet owner. The algorithm is based on the dominating set and independent set problems in graphs. The last algorithm is a framework that classifies nodes as having duplicates in social networks. This algorithm extracts subgraphs to generate features for nodes that are then used as input to a two-level neural network designed specifically for this problem. Real collaboration networks, extracted from DBLP and Google Scholar, as well as familial networks are used to evaluate the proposed algorithms. Experimental results indicate that synonyms can effectively be identified even on anonymous social networks leveraging only network structure.