Detalhes bibliográficos
Ano de defesa: |
2017 |
Autor(a) principal: |
FONCECA JUNIOR, José Ilberto
 |
Orientador(a): |
FIGUEIRÊDO, Pedro Hugo de |
Banca de defesa: |
FIGUEIRÊDO, Pedro Hugo de,
SOUZA, Adauto José Ferreira de,
GONZÁLEZ, Ramón Enrique Ramayo |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal Rural de Pernambuco
|
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Física Aplicada
|
Departamento: |
Departamento de Física
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://www.tede2.ufrpe.br:8080/tede2/handle/tede2/7799
|
Resumo: |
The application of mathematical and statistical methods to exploit properties in natural languages has a recent and proli c history. These methods and the quantitative tecnhiques adapted and created through the study of languages are part of an area usually called quantitative linguistics. The rst work on such area was performed by George Zipf from 1930 to 1950 in which the distribution of word frequencies were studied. His works were followed by Claude Shannon's analysis on entropy and letters prediction as a measure of redundancy in written english. In this work, we firstly present a study on correlation and cross-correlation through the time series extracted from texts by using common approaches to investigate non-stationary time series. To perform the required analysis we have used a corpora as large as 250 literary texts from 10 diferent languages. The properties emerging from these correlations will also be discussed and properly explained. Secondly, we move to the description of the distance distribution responsible for the long-range structure observed on written language. We devise those distributions by assuming the distance distribution from consecutive prime numbers and distances taken from a Weibull distributed process. The revenues from such models will be put under scrutiny by using the techniques presented during the work and comparing them to properties emerging in natural language. |