Investigação do processo de stemming na lingua portuguesa

Detalhes bibliográficos
Ano de defesa: 2008
Autor(a) principal: Alvares, Reinaldo Viana
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Programa de Pós-Graduação em Computação
Computação
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
KDD
Link de acesso: https://app.uff.br/riuff/handle/1/17898
Resumo: The information retrieval process is a usual task for the human. However, having a complex automation. This happens because the quality of the results is often related with the degree of the user's satisfaction, a difficult parameter to measure. In general this quality is evaluated being taking into account a group of queries in a text collection, and their relevant answers. Commonly, two evaluation measures are used in this process: the first is the precision, wich represents the proportion of recovered relevant items from the total of recovered items; and the second is the recall, wich represents the proportion of recovered relevant items from the total of relevant items of the collection. One of the challenges is to find efficient forms to represent the documents, in order to avoid ambiguity. An alternative to solve this problem consists of obtaining a unique representation for words that appear for a same concept. This task can be defined as stemming. Many times, the stemming process is dependent to the morphologic structure of the target language. For the Portuguese language, there were found few solutions to assist the demand for these algorithms. The morphologic complexity of Portuguese language, and the few stemming solutions found for this language, were the motivation for the research shown in this work. This work presents a new model for the stemming process, that is applicable to the Portuguese language, based on a statistical study accomplished in a collection of extracted words of the Brazilian Web. With objective of evaluating the model, a stemmer is implemented and compared with a solution found in the literature, especially developed for Portuguese. The main contributions of this work are the systematical model for the stemming process, besides the stemmer conceived and implemented specially for the Portuguese language.