Finding maximum patterns using decision diagrams

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Albuquerque, Lucas Braga de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/56609
Resumo: Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.