Error annotation in the COPLE2 corpus

del Rio, Iria; Mendes, Amália

Error annotation in the COPLE2 corpus

Bibliographic Details
Main Author:	del Rio, Iria
Publication Date:	2018
Other Authors:	Mendes, Amália
Format:	Article
Language:	por
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	https://doi.org/10.26334/2183-9077/rapln4ano2018a42
Summary:	We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.

Item metadata

id	RCAP_bdc8cc0728067adf04b2d8c30a30b7d7
oai_identifier_str	oai:ojs3.ojs.apl.pt:article/42
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Error annotation in the COPLE2 corpusAnotação de erros no corpus COPLE2corpus de aprendentesanotação do erroprocessamento de língua naturalaquisição de L2learner corpuserror annotationL2 acquisitionnatural language processingWe present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.Associação Portuguesa de Linguística2018-10-15info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.26334/2183-9077/rapln4ano2018a42https://doi.org/10.26334/2183-9077/rapln4ano2018a42Revista da Associação Portuguesa de Linguística; No. 4 (2018): Journal of the Portuguese Linguistics Association; 225-239Revista da Associação Portuguesa de Linguística; N.º 4 (2018): Revista da Associação Portuguesa de Linguística; 225-2392183-9077reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAPporhttps://ojs.apl.pt/index.php/rapl/article/view/42https://ojs.apl.pt/index.php/rapl/article/view/42/44Direitos de Autor (c) 2018 Iria del Rio, Amália Mendesinfo:eu-repo/semantics/openAccessdel Rio, IriaMendes, Amália2023-12-09T10:16:09Zoai:ojs3.ojs.apl.pt:article/42Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:13:12.821072Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Error annotation in the COPLE2 corpus Anotação de erros no corpus COPLE2
title	Error annotation in the COPLE2 corpus
spellingShingle	Error annotation in the COPLE2 corpus del Rio, Iria corpus de aprendentes anotação do erro processamento de língua natural aquisição de L2 learner corpus error annotation L2 acquisition natural language processing
title_short	Error annotation in the COPLE2 corpus
title_full	Error annotation in the COPLE2 corpus
title_fullStr	Error annotation in the COPLE2 corpus
title_full_unstemmed	Error annotation in the COPLE2 corpus
title_sort	Error annotation in the COPLE2 corpus
author	del Rio, Iria
author_facet	del Rio, Iria Mendes, Amália
author_role	author
author2	Mendes, Amália
author2_role	author
dc.contributor.author.fl_str_mv	del Rio, Iria Mendes, Amália
dc.subject.por.fl_str_mv	corpus de aprendentes anotação do erro processamento de língua natural aquisição de L2 learner corpus error annotation L2 acquisition natural language processing
topic	corpus de aprendentes anotação do erro processamento de língua natural aquisição de L2 learner corpus error annotation L2 acquisition natural language processing
description	We present the general architecture of the error annotation system applied to the COPLE2 corpus, a learner corpus of Portuguese implemented on the TEITOK platform. We give a general overview of the corpus and of the TEITOK functionalities and describe how the error annotation is structured in a two-level system: first, a fully manual token-based and coarse-grained annotation is applied and produces a rough classification of the errors in three categories, paired with multi-level information for POS and lemma; second, a multi-word and fine-grained annotation in standoff is then semi-automatically produced based on the first level of annotation. The token-based level has been applied to 47% of the total corpus. We compare our system with other proposals of error annotation, and discuss the fine-grained tag set and the experiments to validate its applicability. An inter-annotator (IAA) experiment was performed on the two stages of our system using Cohen’s kappa and it achieved good results on both levels. We explore the possibilities offered by the tokenlevel error annotation, POS and lemma to automatically generate the fine-grained error tags by applying conversion scripts. The model is planned in such a way as to reduce manual effort and rapidly increase the coverage of the error annotation over the full corpus. As the first learner corpus of Portuguese with error annotation, we expect COPLE2 to support new research in different fields connected with Portuguese as second/foreign language, like Second Language Acquisition/Teaching or Computer Assisted Learning.
publishDate	2018
dc.date.none.fl_str_mv	2018-10-15
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.26334/2183-9077/rapln4ano2018a42 https://doi.org/10.26334/2183-9077/rapln4ano2018a42
url	https://doi.org/10.26334/2183-9077/rapln4ano2018a42
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://ojs.apl.pt/index.php/rapl/article/view/42 https://ojs.apl.pt/index.php/rapl/article/view/42/44
dc.rights.driver.fl_str_mv	Direitos de Autor (c) 2018 Iria del Rio, Amália Mendes info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Direitos de Autor (c) 2018 Iria del Rio, Amália Mendes
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Associação Portuguesa de Linguística
publisher.none.fl_str_mv	Associação Portuguesa de Linguística
dc.source.none.fl_str_mv	Revista da Associação Portuguesa de Linguística; No. 4 (2018): Journal of the Portuguese Linguistics Association; 225-239 Revista da Associação Portuguesa de Linguística; N.º 4 (2018): Revista da Associação Portuguesa de Linguística; 225-239 2183-9077 reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833591695008071680

Error annotation in the COPLE2 corpus

Similar Items