Processos de construção automática de tesauro

Detalhes bibliográficos
Ano de defesa: 2011
Autor(a) principal: Granada, Roger Leitzke lattes
Orientador(a): Lima, Vera Lúcia Strube de lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Pontifícia Universidade Católica do Rio Grande do Sul
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação
Departamento: Faculdade de Informáca
País: BR
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: http://tede2.pucrs.br/tede2/handle/tede/5158
Resumo: The advances in technology have made the amount of information available in digital format increase rapidly. This increase reflects on the importance of efficient systems to Information Retrieval (IR), getting the right information when it's requested by users. Thesauri can be associated with IR systems, allowing the system to query not only by the key term, but also by related terms, obtaining related documents that were not retrieved. The manual construction, long and costly process that gave rise to the first thesaurus, shall be performed automatically, using different methods and processes available today. With this motivation, this dissertation proposes to study three cases of automatic thesauri construction. One method uses statistical techniques to identify the best related terms. Another method uses syntactic knowledge, being necessary to extract, besides the grammatical categories of each term, the relations that a verb have with its subject or object. The latter method makes use of syntactic knowledge and semantic knowledge of the terms, identifying non apparent relations. For this, this latter method uses an adaptation of the Latent Semantic Analysis technique. We developed three methods for automatic thesaurus construction using documents from the field of data privacy. The results were applied to an IR system, allowing the evaluation by domain experts. In conclusion, we observed that, in certain cases, it's better to apply techniques that do not use semantic knowledge of the terms, obtaining better results with methods that use only the syntactic knowledge of them.