Regras de associação e correlação temporal para popular e detectar Inconsistências em grandes bases de conhecimento

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Miani, Rafael Garcia Leonel
Orientador(a): Hruschka Júnior, Estevam Rafael lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/9490
Resumo: Large growing knowledge bases have been an interesting field in many researches in the past few years. Most techniques focus on constructing algorithms to help a Knowledge Base (KB) automatically (or semi automatically) expands. However, many tools used to expand the KBs can extract incomplete or incorrect data, turning the KB inconsistent. In this way, this work has the objective to expand large knowledge bases as well as detect inconsistencies on them. To accomplish that, an association rule mining algorithm and temporal correlation are used. Applying an algorithm to extract association rules in large knowledge bases, the missing value problem need to be considered, once these bases grow day to day, and do not have all of the data. Therefore, a new parameter was created to perform the support calculation, the MSC parameter, to deal with missing values. Besides, a major problem on using association rules is the effort spent to analyze each extracted rule. Thus, this work developed ER component, which eliminates redundant and irrelevant association rules. Each valid rule is used by TARE component with the purpose of detecting inconsistencies. TARE introduces the concept of STARs (specific temporal association rules), which are used to detect possible inconsistencies. Each relevant STAR is used as an input to TCI component in order to get temporal correlations to (i) detect possible inconsistencies and (ii) to help populating the KB. Experiments showed that the association rules and the temporal correlation are capable to expand the knowledge base, decreasing the amount of missing values. Moreover, both TARE and TCI components were efficient in the process of detecting possible inconsistencies in the data set. Finally, the ER component reduced the number of rules in more then 30% without any lost in the process of populating the KB.