Efeito do ranking sobre métricas de categorização multi-rótulo de texto
Ano de defesa: | 2009 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal do Espírito Santo
BR Mestrado em Engenharia Elétrica Centro Tecnológico UFES Programa de Pós-Graduação em Engenharia Elétrica |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.ufes.br/handle/10/4070 |
Resumo: | A multi-label text categorization system typically ranks a set of predefined labels according to their appropriateness to a given document and then selects the top ranking labels as the document’s label set. Ties occurring in the ranking can be broken in many different ways but, although this may affect the metrics used to evaluate the multi-label text categorizer, the issue seems to have been little addressed in the literature. In this paper, we analyze the impact of different ranking methods on ten multi-label text categorization performance metrics: one-error, coverage, ranking loss, average precision, R-precision, Hamming loss, exact match, precision, recall, and F1. To this end, we first reformulate some of the metrics in order for ties to be taken into account. We then use them to evaluate the performance of three multi-label text categorization techniques, k -nearest neighbors ( k NN), multi label k -nearest neighbors (ML- k NN), virtual generalizing random access memory weightless neural networks (VG-RAM WNN) and VG-RAM Data Correlation (VG-RAM WNN-COR), on the categorization of two multi-label text databases with large numbers of labels (105 and 692 categories). We have found that, depending on the method adopted for ranking, the performance results are significantly different for many of the metrics in question, which suggests that the particular ranking method one uses should always be indicated clearly whenever evaluating multi-label text categorization techniques. |