Error annotation in the COPLE2 corpus

Bibliographic Details
Main Author: del Río, Iria
Publication Date: 2018
Other Authors: Mendes, Amália
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10451/36512
Summary: We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the token-level error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.
id RCAP_31f22941f688e3a78c1b33dc91bf45ed
oai_identifier_str oai:repositorio.ulisboa.pt:10451/36512
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Error annotation in the COPLE2 corpusLearner corpusError annotationSecond language acquisitionNatural language processingCorpus de aprendentesAnotação do erroAquisição de língua segundaProcessamento de língua naturalWe present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the token-level error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.Associação Portuguesa de Linguí­sticaRepositório da Universidade de Lisboadel Río, IriaMendes, Amália2019-01-18T10:11:44Z2018-09-222018-09-22T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10451/36512engdel Río, I., & Mendes, A. (2018). Error annotation in the COPLE2 corpus. Revista Da Associação Portuguesa De Linguí­stica, (4), 225-239. https://doi.org/10.26334/2183-9077/rapln4ano2018a422183-9077https://doi.org/10.26334/2183-9077/rapln4ano2018a42info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T14:02:08Zoai:repositorio.ulisboa.pt:10451/36512Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T03:01:03.538756Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Error annotation in the COPLE2 corpus
title Error annotation in the COPLE2 corpus
spellingShingle Error annotation in the COPLE2 corpus
del Río, Iria
Learner corpus
Error annotation
Second language acquisition
Natural language processing
Corpus de aprendentes
Anotação do erro
Aquisição de língua segunda
Processamento de língua natural
title_short Error annotation in the COPLE2 corpus
title_full Error annotation in the COPLE2 corpus
title_fullStr Error annotation in the COPLE2 corpus
title_full_unstemmed Error annotation in the COPLE2 corpus
title_sort Error annotation in the COPLE2 corpus
author del Río, Iria
author_facet del Río, Iria
Mendes, Amália
author_role author
author2 Mendes, Amália
author2_role author
dc.contributor.none.fl_str_mv Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv del Río, Iria
Mendes, Amália
dc.subject.por.fl_str_mv Learner corpus
Error annotation
Second language acquisition
Natural language processing
Corpus de aprendentes
Anotação do erro
Aquisição de língua segunda
Processamento de língua natural
topic Learner corpus
Error annotation
Second language acquisition
Natural language processing
Corpus de aprendentes
Anotação do erro
Aquisição de língua segunda
Processamento de língua natural
description We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the token-level error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.
publishDate 2018
dc.date.none.fl_str_mv 2018-09-22
2018-09-22T00:00:00Z
2019-01-18T10:11:44Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/36512
url http://hdl.handle.net/10451/36512
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv del Río, I., & Mendes, A. (2018). Error annotation in the COPLE2 corpus. Revista Da Associação Portuguesa De Linguí­stica, (4), 225-239. https://doi.org/10.26334/2183-9077/rapln4ano2018a42
2183-9077
https://doi.org/10.26334/2183-9077/rapln4ano2018a42
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Associação Portuguesa de Linguí­stica
publisher.none.fl_str_mv Associação Portuguesa de Linguí­stica
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833601563375960064