Descoberta de conjuntos de itens frequentes com o modelo de programação MapReduce sobre contextos de incerteza

Detalhes bibliográficos
Ano de defesa: 2015
Autor(a) principal: Carvalho, Juliano Varella de lattes
Orientador(a): Ruiz, Duncan Dubugras Alcoba lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Pontifícia Universidade Católica do Rio Grande do Sul
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação
Departamento: Faculdade de Informática
País: Brasil
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: http://tede2.pucrs.br/tede2/handle/tede/6254
Resumo: Frequent Itemsets Mining (FIM) is a data mining task used to find relations between dataset items. Apriori is the traditional algorithm of the Generate-and-Test class to discover these relations. Recent studies show that this algorithm and others of this task are not adapted to execute in contexts with uncertainty because these algorithms are not prepared to handle with the probabilities associated to items of the dataset. Nowadays, data with uncertainty occur in many applications, for example, data collected from sensors, information about the presence of objects in satellite images and data from application of statistical methods. Due to big datasets with associated uncertainty, new algorithms have been developed to work in this context: UApriori, UF-Growth and UH-Mine. UApriori, specially, is an algorithm based in expected support, often addressed by scientific community. On the one hand, when this algorithm is applied to big datasets, in a context with associated probabilities to dataset items, it does not present good scalability. On the other hand, some works have evolved the Apriori algorithm joining with the model of programming MapReduce, in order to get a better scalability. With this model, it is possible to discover frequent itemsets using parallel and distributed computation. However, these works focus their efforts on discovering frequent itemsets on deterministic datasets. This thesis present the development, implementation and experiments applied to three algorithms: UAprioriMR, UAprioriMRByT and UAprioriMRJoin. The three cited algorithms evolve the traditional algorithm Apriori, integrating the model of programming MapReduce, on contexts with uncertainty. The algorithm UAprioriMRJoin is a hybrid algorithm based on the UAprioriMR and UAprioriMRByT algorithms. The experiments expose the good performance of the UAprioriMRJoin algorithm, when applied on big datasets, with many distinct items and a small average number of items per transaction in a cluster of nodes.