QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

Otegi, Arantxa; Aranberri, Nora; Branco, António; Hajic, Jan; Neale, Steven; Osenova, Petya; Pereira, Rita; Popel, Martin; Silva, João; Simov, Kiril; Agirre, Eneko

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

Bibliographic Details
Main Author:	Otegi, Arantxa
Publication Date:	2016
Other Authors:	Aranberri, Nora, Branco, António, Hajic, Jan, Neale, Steven, Osenova, Petya, Pereira, Rita, Popel, Martin, Silva, João, Simov, Kiril, Agirre, Eneko
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10451/33107
Summary:	This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.

Item metadata

id	RCAP_dc5ae83eaa57537a588e81130a1eb988
oai_identifier_str	oai:repositorio.ulisboa.pt:10451/33107
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six LanguagesAnnotated parallel corporaNamed-entity disambiguationWord sense disambiguationCoreferenceThis work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.European Language Resources AssociationRepositório da Universidade de LisboaOtegi, ArantxaAranberri, NoraBranco, AntónioHajic, JanNeale, StevenOsenova, PetyaPereira, RitaPopel, MartinSilva, JoãoSimov, KirilAgirre, Eneko2018-05-04T10:22:57Z20162016-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10451/33107engOtegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T13:52:59Zoai:repositorio.ulisboa.pt:10451/33107Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T02:56:55.203822Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
spellingShingle	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages Otegi, Arantxa Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference
title_short	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_full	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_fullStr	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_full_unstemmed	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
title_sort	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
author	Otegi, Arantxa
author_facet	Otegi, Arantxa Aranberri, Nora Branco, António Hajic, Jan Neale, Steven Osenova, Petya Pereira, Rita Popel, Martin Silva, João Simov, Kiril Agirre, Eneko
author_role	author
author2	Aranberri, Nora Branco, António Hajic, Jan Neale, Steven Osenova, Petya Pereira, Rita Popel, Martin Silva, João Simov, Kiril Agirre, Eneko
author2_role	author author author author author author author author author author
dc.contributor.none.fl_str_mv	Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	Otegi, Arantxa Aranberri, Nora Branco, António Hajic, Jan Neale, Steven Osenova, Petya Pereira, Rita Popel, Martin Silva, João Simov, Kiril Agirre, Eneko
dc.subject.por.fl_str_mv	Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference
topic	Annotated parallel corpora Named-entity disambiguation Word sense disambiguation Coreference
description	This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.
publishDate	2016
dc.date.none.fl_str_mv	2016 2016-01-01T00:00:00Z 2018-05-04T10:22:57Z
dc.type.driver.fl_str_mv	conference object
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10451/33107
url	http://hdl.handle.net/10451/33107
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Otegi, A., N. Aranberri, A. Branco, J. Hajic, S. Neale, P. Osenova, Rita Valadas Pereira, M. Popel, J. Silva, K. Simov, & E. Agirre. "QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages". In Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016),Portoroz, Eslovénia, 23-28 de maio de 2016.
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	European Language Resources Association
publisher.none.fl_str_mv	European Language Resources Association
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833601540974182400

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages

Similar Items