Sintagmas nominais na indexação automática: uma análise estrutural da distribuição de termos relevantes em teses de doutorado da UFMG

Detalhes bibliográficos
Ano de defesa: 2012
Autor(a) principal: Luiz Antonio Lopes Mesquita
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/ECID-943NXP
Resumo: The main goal of this thesis was to analyze whether there was a characteristic behavior regarding the distribution of relevant terms through a scientific text that could contribute as a criterion for its automatic indexing process. The distribution was analyzed in two ways: a linear one, performed from the beginning to the end of the text; and another that considered some of its structural parts (introduction, development and conclusion). The terms considered here were only nominal phrases contained in the texts. The texts considered here are a total of 98 doctoral dissertations from the eight knowledge areas of UFMG. Initially, for each text, 20 nominal phrases were selected as candidates for descriptors. The authors of the theses, through interviews, rated the importance of each nominal phrase as a descriptor of his/her work. 77.9% of candidates were considered relevant. The descriptors relevance values were associated with their positions in the text. We analyzed the resulting values of this distribution considering two types of position: a linear one, where values were consolidated into ten equal and consecutive portions; and one considering other structural parts of the text (such as introduction, development and conclusion). All texts showed a unique and characteristic behavior, as well as a characteristic behavior when the text was related to the natural sciences or social sciences. All behaviors, including general, were characterized in polynomial equations and can be applied as a criterion for automatic indexing.