Variações do método kNN e suas aplicações na classificação automática de textos

SANTOS, Fernando Chagas

Variações do método kNN e suas aplicações na classificação automática de textos

Detalhes bibliográficos
Ano de defesa:	2010
Autor(a) principal:	SANTOS, Fernando Chagas
Orientador(a):	CARVALHO, Cedric Luiz de
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Goiás
Programa de Pós-Graduação:	Mestrado em Ciência da Computação
Departamento:	Ciências Exatas e da Terra - Ciências da Computação
País:	BR
Palavras-chave em Português:	Classificação de Textos Aprendizagem de Máquina Método kNN Critérios de Seleção Geração de Características Geração de Termos 1.Classificação de textos 2.Aprendizagem de máquina 3.Método kNN 4.Critérios de seleção 5.Geração de características 6.Geração de termos
Palavras-chave em Inglês:	Text Classification Machine Learning kNN Method Feature Selection Feature Construction
Área do conhecimento CNPq:	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://repositorio.bc.ufg.br/tede/handle/tde/499
Resumo:	Most research on Automatic Text Categorization (ATC) seeks to improve the classifier performance (effective or efficient) responsible for automatically classifying a document d not yet rated. The k nearest neighbors (kNN) is simpler and it s one of automatic classification methods more effective as proposed. In this paper we proposed two kNN variations, Inverse kNN (kINN) and Symmetric kNN (kSNN) with the aim of improving the effectiveness of ACT. The kNN, kINN and kSNN methods were applied in Reuters, 20ng and Ohsumed collections and the results showed that kINN and kSNN methods were more effective than kNN method in Reuters and Ohsumed collections. kINN and kSNN methods were as effective as kNN method in 20NG collection. In addition, the performance achieved by kNN method is more stable than kINN and kSNN methods when the value k change. A parallel study was conducted to generate new features in documents from the similarity matrices resulting from the selection criteria for the best results obtained in kNN, kINN and kSNN methods. The SVM (considered a state of the art method) was applied in Reuters, 20NG and Ohsumed collections - before and after applying this approach to generate features in these documents and the results showed statistically significant gains for the original collection.

Variações do método kNN e suas aplicações na classificação automática de textos

Registros relacionados