Export Ready — 

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

Bibliographic Details
Main Author: Otegi, Arantxa
Publication Date: 2016
Other Authors: Aranberri, Nora, Branco, António, Hajic, Jan, Neale, Steven, Osenova, Petya, Pereira, Rita, Popel, Martin, Silva, João, Simov, Kiril, Agirre, Eneko
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10451/33107
Summary: This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.
id RCAP_dc5ae83eaa57537a588e81130a1eb988
oai_identifier_str oai:repositorio.ulisboa.pt:10451/33107
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six LanguagesAnnotated parallel corporaNamed-entity disambiguationWord sense disambiguationCoreferenceThis work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.European Language Resources AssociationRepositório da Universidade de LisboaOtegi, ArantxaAranberri, NoraBranco, AntónioHajic, JanNeale, StevenOsenova, PetyaPereira, RitaPopel, MartinSilva, JoãoSimov, KirilAgirre, Eneko2018-05-04T10:22:57Z20162016-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10451/33107engOtegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T13:52:59Zoai:repositorio.ulisboa.pt:10451/33107Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T02:56:55.203822Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
spellingShingle QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
Otegi, Arantxa
Annotated parallel corpora
Named-entity disambiguation
Word sense disambiguation
Coreference
title_short QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_full QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_fullStr QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_full_unstemmed QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_sort QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
author Otegi, Arantxa
author_facet Otegi, Arantxa
Aranberri, Nora
Branco, António
Hajic, Jan
Neale, Steven
Osenova, Petya
Pereira, Rita
Popel, Martin
Silva, João
Simov, Kiril
Agirre, Eneko
author_role author
author2 Aranberri, Nora
Branco, António
Hajic, Jan
Neale, Steven
Osenova, Petya
Pereira, Rita
Popel, Martin
Silva, João
Simov, Kiril
Agirre, Eneko
author2_role author
author
author
author
author
author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Otegi, Arantxa
Aranberri, Nora
Branco, António
Hajic, Jan
Neale, Steven
Osenova, Petya
Pereira, Rita
Popel, Martin
Silva, João
Simov, Kiril
Agirre, Eneko
dc.subject.por.fl_str_mv Annotated parallel corpora
Named-entity disambiguation
Word sense disambiguation
Coreference
topic Annotated parallel corpora
Named-entity disambiguation
Word sense disambiguation
Coreference
description This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.
publishDate 2016
dc.date.none.fl_str_mv 2016
2016-01-01T00:00:00Z
2018-05-04T10:22:57Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/33107
url http://hdl.handle.net/10451/33107
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Otegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016.
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv European Language Resources Association
publisher.none.fl_str_mv European Language Resources Association
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833601540974182400