Uma abordagem baseada em fluxo de filtros para o reconhecimento de entidades em mensagens do twitter

Diego Marinho de Oliveira

Uma abordagem baseada em fluxo de filtros para o reconhecimento de entidades em mensagens do twitter

Detalhes bibliográficos
Ano de defesa:	2012
Autor(a) principal:	Diego Marinho de Oliveira
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais UFMG
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Redes sociais Microblogs Twitter Conditional random fields Reconhecimento de entidades Computação Redes sociais on-line
Link de acesso:	http://hdl.handle.net/1843/ESBF-8ZKMCP
Resumo:	The task of entity named recognition is to locate and classify elements in unstructured text through techniques of natural language processing appropriate to the application domain. In the Web context, this task is critical to the identification of entities such as people, organizations, places, among others. Recently, microblogs like Twitter and Tumblr became a phenomenon on the Web, representing a new challenge for the recognition of entities. In Twitter, for example, traffic a large volume of messages in a short time, dificulting the task and the extraction of information about a particular subject. Moreover, the Twitter environment is quite dynamic and driven by data stream, requiring thus tools and methods suited to its characteristics. There is not in the literature, however, many works that address this issue, showing a wide area of research to be conducted for named entity recognition in this environment. Thus, this master thesis proposes an alternative approach to perform this task called FS-NER (Filter Stream Named Entity Recognition). The FS-NER approach is based on the use of filters in an independent and fast manner, highly scalable and suitable for the environment of the Twitter for named entity recognition. In order to evaluate the effectiveness of the proposed approach, we carried out an exhaustive set of experiments using messages of Twitter. In these experiments, we used three distinct collections: one containing messages in English, one in Portuguese and third in several languages. The results showed that despite the simplicities of the filters used, the proposed approach was able to outperform the other approach based on Conditional Random Fields with improvement mean of 3% for the F1 metric. Moreover, this approach presents orders of magnitude faster and therefore more suitable for the typical data stream paradigm of Twitter.

Uma abordagem baseada em fluxo de filtros para o reconhecimento de entidades em mensagens do twitter

Registros relacionados