The impact of NLP techniques in the multilabel text classification problem
Main Author: | |
---|---|
Publication Date: | 2004 |
Other Authors: | |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10174/2558 |
Summary: | Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office. |
id |
RCAP_4f6e13ffb7f611e5be0ca3dc916f5273 |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/2558 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
The impact of NLP techniques in the multilabel text classification problemmachine learningText classificationSupport Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office.Springer-Verlag2011-02-15T11:25:04Z2011-02-152004-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article168602 bytesapplication/pdfhttp://hdl.handle.net/10174/2558http://hdl.handle.net/10174/2558eng424-428Advances in Soft Computinglivretcg@uevora.ptpq@uevora.ptIIPWM-04, Intelligent Information Processing and Web MiningKlopotek, M.Weirzchon, S.Trojanowski, K.498Gonçalves, TeresaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-01-03T18:39:06Zoai:dspace.uevora.pt:10174/2558Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:51:22.029443Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
The impact of NLP techniques in the multilabel text classification problem |
title |
The impact of NLP techniques in the multilabel text classification problem |
spellingShingle |
The impact of NLP techniques in the multilabel text classification problem Gonçalves, Teresa machine learning Text classification |
title_short |
The impact of NLP techniques in the multilabel text classification problem |
title_full |
The impact of NLP techniques in the multilabel text classification problem |
title_fullStr |
The impact of NLP techniques in the multilabel text classification problem |
title_full_unstemmed |
The impact of NLP techniques in the multilabel text classification problem |
title_sort |
The impact of NLP techniques in the multilabel text classification problem |
author |
Gonçalves, Teresa |
author_facet |
Gonçalves, Teresa Quaresma, Paulo |
author_role |
author |
author2 |
Quaresma, Paulo |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Gonçalves, Teresa Quaresma, Paulo |
dc.subject.por.fl_str_mv |
machine learning Text classification |
topic |
machine learning Text classification |
description |
Support Vector Machines have been used successfully to classify text documents into sets of concepts. However, typically, linguistic information is not being used in the classification process or its use has not been fully evaluated. We apply and evaluate two basic linguistic procedures (stop-word removal and stemming/lemmatization) to the multilabel text classification problem. These procedures are applied to the Reuters dataset and to the Portuguese juridical documents from Supreme Courts and Attorney General’s Office. |
publishDate |
2004 |
dc.date.none.fl_str_mv |
2004-01-01T00:00:00Z 2011-02-15T11:25:04Z 2011-02-15 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/2558 http://hdl.handle.net/10174/2558 |
url |
http://hdl.handle.net/10174/2558 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
424-428 Advances in Soft Computing livre tcg@uevora.pt pq@uevora.pt IIPWM-04, Intelligent Information Processing and Web Mining Klopotek, M. Weirzchon, S. Trojanowski, K. 498 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
168602 bytes application/pdf |
dc.publisher.none.fl_str_mv |
Springer-Verlag |
publisher.none.fl_str_mv |
Springer-Verlag |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833592299719753728 |