Recuperação da informação através de busca comparada em domínio específico, baseado em expressões multipalavras

Detalhes bibliográficos
Ano de defesa: 2013
Autor(a) principal: Edson Marchetti da Silva
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/BUOS-97XFVY
Resumo: Normally, the search engines in databases is performed using keywords provided by the user to perform the documents identification. This study aims to propose an additional alternative that can be aggregated to Information Retrieval Systems (IRS) to assist the user in the process of information search. This alternative allows the realization of an automated search based on a document supplied by the user which serves as a reference. In this context the object of study was the extraction of Multi Word Expressions (MWE) of the document to serve as descriptors of the search in aspecific corpus. The MWE are obtained by a deterministic method which proposed that considers the characteristics of the physical structure of the document and compares the result with that obtained for thirteen different measures of association statistics produced by Statistics Ngram Package (NSP), which considers the text as a set of bag of words. The results demonstrate that the proposed method provides a better semantic representation of the document bringing together qualitative gains in MWE extracted and that it contributes positively to the results of the search compared. From these experiments we have proposed and implemented a prototype of a compared search tool and it was present the results obtained with its use.