Processamento de linguagem natural: caracterizacao da produção científica dos pesquisadores brasileiros

Detalhes bibliográficos
Ano de defesa: 2010
Autor(a) principal: Ana Paula Ladeira
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/ECID-8B3Q6C
Resumo: Natural language processing researchs (NLP) has being made by researchers from areas as computer science, information science and linguistics. This thesis aims to use the knowledge accumulated over the past 40 years in NLP and published in ARIST, as a reference to select and to analyze the scientific production of the Brazilian academic community in the area. Brazilian publications about NLP were collected automatically from Lattes database (http://lattes.cnpq.br/). The tool forautomatic selection of NLP publications from Brazilian Lattes database was built by analyzing the subject of review articles of ARIST. A total of 621 publications were automatically related to NLP area and were retrieved from Lattes database. A randomly sample of 68 papers from this total was submitted to content analysis. This analysis allowed identifying the main issues about NLP discussed by the Brazilian scientific community. We observed that the majority of Brazilian publications were published after the year 2000. Moreover, the participation of information science hasbeen very modest in NLP publication. However, computer science and linguistics were responsible for almost 85% of Brazilian production. Twelve investigators were responsible for more than 20% of all Brazilian production, and among them, nine were from computer science, two from linguistics, and one from electrical engineering. Besides, it is noteworthy that among the twelve main researchers, seven were part of just one research group that works with computational linguistics, the NILC - Núcleo Interinstitucional de Lingüística Computacional (http://nilc.icmc.sc.usp.br/). Among the most discussed issues, we observed the following: translation was discussed intensively in the 90's, indexing studies decreased after the 80's, studies about classification became inactive during the 90s, and there is a clear trend in the area of NLP to develop automatic summarization. Another aspect revealed by the analysis was that information science has focused mainly on automatic indexing and content analysis, while computer science has focused primarily on automatic translation and summarization. The content analysis performed on 68 sample publications showed that retrieval information was the issue most prominent in Brazilian scientific production. Only two papers that worked with summarization used a deep approach to produce summaries. The most research in automatic summarization emphasized on empirical approach to generate extracts.Researches on automatic translation using statistical methods and transfers rules obtained very similar results. Brazilian studies on NLP involve different disciplines from information science. These studies should to be well known by the researchers from information science whose can benefit from the computational tools developed that can be applied in classical processes such as cataloging, information representation and retrieval.