Detalhes bibliográficos
Ano de defesa: |
2012 |
Autor(a) principal: |
Alvarenga, Leonel Diógenes Carvalhaes
 |
Orientador(a): |
Rosa, Thierson Couto
 |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal de Goiás
|
Programa de Pós-Graduação: |
Programa de Pós Graduação em Ciência da Computação (INF)
|
Departamento: |
Instituto de Informática (INF)
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://repositorio.bc.ufg.br/tede/handle/tde/2870
|
Resumo: |
The traditional methods of text classification typically represent documents only as a set of words, also known as "Bag of Words"(BOW). Several studies have shown good results on making use of thesauri and encyclopedias as external information sources, aiming to expand the BOW representation by the identification of synonymy and hyponymy relationships between present terms in a document collection. However, the expansion process may introduce terms that lead to an erroneous classification. In this paper, we propose the use of feature selection measures in order to select features extracted from Wikipedia in order to improve the efectiveness of the expansion process. The study also proposes a feature selection measure called Tendency Factor to One Category (TF1C), so that the experiments showed that this measure proves to be competitive with the other measures Information Gain, Gain Ratio and Chisquared, in the process, delivering the best gains in microF1 and macroF1, in most experiments. The full use of features selected in this process showed to be more stable in assisting the classification, while it showed lower performance on restricting its insertion only to documents of the classes in which these features are well punctuated by the selection measures. When applied in the Reuters-21578, Ohsumed first - 20000 and 20Newsgroups collections, our approach to feature selection allowed the reduction of noise insertion inherent in the expansion process, and improved the results of use hyponyms, and demonstrated that the synonym relationship from Wikipedia can also be used in the document expansion, increasing the efectiveness of the automatic text classification. |