QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
Main Author: | |
---|---|
Publication Date: | 2016 |
Other Authors: | , , , , , , , , , |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10451/33107 |
Summary: | This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer. |
id |
RCAP_dc5ae83eaa57537a588e81130a1eb988 |
---|---|
oai_identifier_str |
oai:repositorio.ulisboa.pt:10451/33107 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six LanguagesAnnotated parallel corporaNamed-entity disambiguationWord sense disambiguationCoreferenceThis work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.European Language Resources AssociationRepositório da Universidade de LisboaOtegi, ArantxaAranberri, NoraBranco, AntónioHajic, JanNeale, StevenOsenova, PetyaPereira, RitaPopel, MartinSilva, JoãoSimov, KirilAgirre, Eneko2018-05-04T10:22:57Z20162016-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10451/33107engOtegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T13:52:59Zoai:repositorio.ulisboa.pt:10451/33107Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T02:56:55.203822Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
title |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
spellingShingle |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages Otegi, Arantxa Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference |
title_short |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
title_full |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
title_fullStr |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
title_full_unstemmed |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
title_sort |
QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages |
author |
Otegi, Arantxa |
author_facet |
Otegi, Arantxa Aranberri, Nora Branco, António Hajic, Jan Neale, Steven Osenova, Petya Pereira, Rita Popel, Martin Silva, João Simov, Kiril Agirre, Eneko |
author_role |
author |
author2 |
Aranberri, Nora Branco, António Hajic, Jan Neale, Steven Osenova, Petya Pereira, Rita Popel, Martin Silva, João Simov, Kiril Agirre, Eneko |
author2_role |
author author author author author author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Otegi, Arantxa Aranberri, Nora Branco, António Hajic, Jan Neale, Steven Osenova, Petya Pereira, Rita Popel, Martin Silva, João Simov, Kiril Agirre, Eneko |
dc.subject.por.fl_str_mv |
Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference |
topic |
Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference |
description |
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016 2016-01-01T00:00:00Z 2018-05-04T10:22:57Z |
dc.type.driver.fl_str_mv |
conference object |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/33107 |
url |
http://hdl.handle.net/10451/33107 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Otegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016. |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
European Language Resources Association |
publisher.none.fl_str_mv |
European Language Resources Association |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833601540974182400 |