Uso de apontadores na classificação de documentos em coleções digitais
Ano de defesa: | 2007 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/RVMR-7AAQQJ |
Resumo: | In this work, we show how information derived from links among Web documents can be used in the solutions of the problem of document classification. The most obvious form of link between two Web documents is a hyperlink connecting them. But links can also be derived from references among documents of digital collections hosted in the Web, for instance, from citations among articles of digital libraries and encyclopedias. Specifically, we study how the use of measures derived from link information, namedbibliometric measures can improve the accuracy of classification systems. As bibliometric measures, we used co-citation, bibliographic coupling and Amsler. We obtained distinct classifiers by applying bibliometric and text-based measures to the traditional k-nearest neighbors (kNN) and Support Vector Machine (SVM) classification methods. Bibliometric measures were shown to be effective for document classification whenever some characteristics of link distribution is present in the collection. Most of the documents where the classifier based on bibliometric measures failed were shown to be difficult ones even for human classification. We also propose a new alternative way of combining results of bibliometric-measurebased classifiers and text based classifiers. In the experiments performed with three distinct collections, the combination approach adopted achieved results better than the results of each classifier in isolation. |