Regras de associação e correlação temporal para popular e detectar Inconsistências em grandes bases de conhecimento
Ano de defesa: | 2017 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação - PPGCC
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Palavras-chave em Inglês: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/9490 |
Resumo: | Large growing knowledge bases have been an interesting field in many researches in the past few years. Most techniques focus on constructing algorithms to help a Knowledge Base (KB) automatically (or semi automatically) expands. However, many tools used to expand the KBs can extract incomplete or incorrect data, turning the KB inconsistent. In this way, this work has the objective to expand large knowledge bases as well as detect inconsistencies on them. To accomplish that, an association rule mining algorithm and temporal correlation are used. Applying an algorithm to extract association rules in large knowledge bases, the missing value problem need to be considered, once these bases grow day to day, and do not have all of the data. Therefore, a new parameter was created to perform the support calculation, the MSC parameter, to deal with missing values. Besides, a major problem on using association rules is the effort spent to analyze each extracted rule. Thus, this work developed ER component, which eliminates redundant and irrelevant association rules. Each valid rule is used by TARE component with the purpose of detecting inconsistencies. TARE introduces the concept of STARs (specific temporal association rules), which are used to detect possible inconsistencies. Each relevant STAR is used as an input to TCI component in order to get temporal correlations to (i) detect possible inconsistencies and (ii) to help populating the KB. Experiments showed that the association rules and the temporal correlation are capable to expand the knowledge base, decreasing the amount of missing values. Moreover, both TARE and TCI components were efficient in the process of detecting possible inconsistencies in the data set. Finally, the ER component reduced the number of rules in more then 30% without any lost in the process of populating the KB. |