Protocolo para anotação linguística e gerenciamento de amostras sociolinguísticas : o caso da amostra Deslocamentos 2019

Sousa, Marta Deysiane Alves Faria

Protocolo para anotação linguística e gerenciamento de amostras sociolinguísticas : o caso da amostra Deslocamentos 2019

Detalhes bibliográficos
Ano de defesa:	2023
Autor(a) principal:	Sousa, Marta Deysiane Alves Faria
Orientador(a):	Freitag, Raquel Meister Ko.
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Não Informado pela instituição
Programa de Pós-Graduação:	Pós-Graduação em Letras
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Sociolinguística Ciência Amostragem (Estatística) Ferramentas Sociolinguística variacionista Ciência aberta Dados de fala Anotação linguística Processamento de linguagem natural (PLN)
Palavras-chave em Inglês:	Variationist sociolinguistics Open science Speech data Linguistic annotation
Área do conhecimento CNPq:	LINGUISTICA, LETRAS E ARTES::LETRAS
Link de acesso:	https://ri.ufs.br/jspui/handle/riufs/18363
Resumo:	Linguistic data bases are considered tools that provide researchers with fast access to language samples (written or oral texts), crossing among data from different regions, and a linguistic collection of a certain period of time and place, being useful not only to scientific purposes, but also to didactic ones (FREITAG; MARTINS; TAVARES, 2012; GONÇALVES, 2019; SILVA, 2015). Both in Brazilian and international scenarios, there is a concern with the documentation and archiving of sociolinguistic samples, which may be explained due to the importance of these data to the advance of the research in this field (KENDALL, 2013), to the Open Science demands in relation to the sharing of data, and also to the technological advances regarding archiving and linguistic annotation (VANN, 2021). However, as in the international scenario, in Brazil, such endeavors have been individually made, without standardization in the methodologies, codes, and data availability, which makes it difficult to replicate and, consequently, compare different variable phenomena from different databases. In addition to it, there are no sociolinguistic samples linguistically tagged among those that are already available online as well as data storage and management protocols and codes to perform statistical analysis. With this study, we aim at creating a protocol to systematize and disseminate the sample Displacements 2019 (FREITAG, 2018) from Falares Sergipanos database following Open Science principles. Our thesis is that it is possible to use open and free resources to linguistically tag and systematize sociolinguistic samples according to Open Science paradigm. In order to support our thesis, we set the following specific goals: i) to test two free computational tools (LancsBox 6.0 e spaCy 3.5) to linguistically annotate the sample Displacements 2019; ii) to evaluate the annotation performed by each tool; iii) to compare the performance of the two tools in relation to searches and functionalities for a pre-analysis of the phenomenon the filling of the determiner position before possessives in pre-nominal position;iv) to describe actions to disseminate and share the data of the sample Displacements 2019; v) organize the actions taken in a protocol. The general results confirm our thesis that it is possible to systematize and linguistically annotate sociolinguistic samples using only free resources available for the Portuguese language. The tools tested also contributed to searchers that are more accurate and with a greater number of occurrences of the phenomenon in comparison to a manual search. On the other hand, it is still a limitation to host and store a web site with a high number of data using free resources.

Protocolo para anotação linguística e gerenciamento de amostras sociolinguísticas : o caso da amostra Deslocamentos 2019

Registros relacionados