The impact of NLP techniques in the multilabel text classification problem

Bibliographic Details
Main Author: Gonçalves, Teresa
Publication Date: 2004
Other Authors: Quaresma, Paulo
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10174/2558
Summary: Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.
id RCAP_4f6e13ffb7f611e5be0ca3dc916f5273
oai_identifier_str oai:dspace.uevora.pt:10174/2558
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling The impact of NLP techniques in the multilabel text classification problemmachine learningText classificationSupport Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.Springer-Verlag2011-02-15T11:25:04Z2011-02-152004-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article168602 bytesapplication/pdfhttp://hdl.handle.net/10174/2558http://hdl.handle.net/10174/2558eng424-428Advances in Soft Computinglivretcg@uevora.ptpq@uevora.ptIIPWM-04, Intelligent Information Processing and Web MiningKlopotek, M.Weirzchon, S.Trojanowski, K.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:51:22.029443Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv The impact of NLP techniques in the multilabel text classification problem
title The impact of NLP techniques in the multilabel text classification problem
spellingShingle The impact of NLP techniques in the multilabel text classification problem
Gonçalves, Teresa
machine learning
Text classification
title_short The impact of NLP techniques in the multilabel text classification problem
title_full The impact of NLP techniques in the multilabel text classification problem
title_fullStr The impact of NLP techniques in the multilabel text classification problem
title_full_unstemmed The impact of NLP techniques in the multilabel text classification problem
title_sort The impact of NLP techniques in the multilabel text classification problem
author Gonçalves, Teresa
author_facet Gonçalves, Teresa
Quaresma, Paulo
author_role author
author2 Quaresma, Paulo
author2_role author
dc.contributor.author.fl_str_mv Gonçalves, Teresa
Quaresma, Paulo
dc.subject.por.fl_str_mv machine learning
Text classification
topic machine learning
Text classification
description Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.
publishDate 2004
dc.date.none.fl_str_mv 2004-01-01T00:00:00Z
2011-02-15T11:25:04Z
2011-02-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/2558
http://hdl.handle.net/10174/2558
url http://hdl.handle.net/10174/2558
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 424-428
Advances in Soft Computing
livre
tcg@uevora.pt
pq@uevora.pt
IIPWM-04, Intelligent Information Processing and Web Mining
Klopotek, M.
Weirzchon, S.
Trojanowski, K.
498
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 168602 bytes
application/pdf
dc.publisher.none.fl_str_mv Springer-Verlag
publisher.none.fl_str_mv Springer-Verlag
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833592299719753728