Finding maximum patterns using decision diagrams

Bibliographic Details
Main Author: Albuquerque, Lucas Braga de
Publication Date: 2020
Format: Master thesis
Language: por
Source: Repositório Institucional da Universidade Federal do Ceará (UFC)
Download full: http://www.repositorio.ufc.br/handle/riufc/56609
Summary: Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.
id UFC-7_92e12f9bbd58198b63b02e045e9d23f3
oai_identifier_str oai:repositorio.ufc.br:riufc/56609
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Albuquerque, Lucas Braga deBonates, Tibérius de Oliveira e2021-02-19T09:38:19Z2021-02-19T09:38:19Z2020ALBUQUERQUE, Lucas Braga de. Finding maximum patterns using decision diagrams. 2020. 71 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Centro de Ciências, Universidade Federal do Ceará, 2020.http://www.repositorio.ufc.br/handle/riufc/56609Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.Análise Lógica de Dados (ALD) é um algoritmo de classificação supervisionada baseado em regras, o qual é fundamentado em otimização, combinatória, e funções Booleanas. Um conceito central em ALD é o de padrão, o qual resume informação extraída de um dado conjunto de dados. Seja D um conjunto de vetores binários particionado em um conjunto de observações positivas e um conjunto de observações negativas. Um padrão positivo é um subcubo do hipercubo n-dimensional, o qual possui uma interseção não-vazia com a parte positiva de D, e uma interseção vazia com a parte negativa de D. Uma observação é coberta por um padrão se pertence ao subcubo correspondente, e a cobertura de um padrão é o número de observações em D cobertas por ele. O problema do a-padrão positivo máximo consiste em encontrar um padrão cuja cobertura é máxima entre todos aqueles que cobrem uma dada observação positiva a em D, o que corresponde a resolver um problema de cobertura de conjuntos não-linear. Uma generalização do problema, o problema do padrão positivo máximo, busca um padrão positivo cuja cobertura é máxima entre todos os padrões, não apenas aqueles que cobrem uma observação em particular. Revisamos todas as abordagens por programação linear inteira (PLI) para esses dois problemas encontradas na literatura e as avaliamos empiricamente usando um software comercial de PLI. Além disso, introduzimos um modelo de programação dinâmica, uma regra de mescla, e todas as heurísticas necessárias para modelar e resolver os dois problemas utilizando-se de uma metodologia de otimização baseada em diagramas de decisão (DDs), a qual foi desenvolvida recentemente. A metodologia consiste em um algoritmo de branch-and-bound (BAB), no qual DDs fazem o papel tradicional da relaxação linear, assim como o de heurísticas primais. Também discutimos detalhes de implementação relevantes com o intuito de melhorar a performance do BAB baseado em DDs. Por fim, comparamos a performance do nosso resolvedor baseado em DDs com as abordagens de PLI da literatura. Nossos resultados sugerem que uma implementação direta de um BAB baseado em DDs produz, em geral, soluções de mais qualidade do que um software comercial de PLI, dentro de um mesmo limite de tempo.Padrão máximoAnálise lógica de dadosDiagrama de decisãoBranch-and-boundMaximum patternLogical analysis of dataDecision diagramFinding maximum patterns using decision diagramsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81893http://repositorio.ufc.br/bitstream/riufc/56609/4/license.txt4d8f4e989fd8622bc24a719aca4d64ceMD54ORIGINAL2020_dis_lbalbuquerque.pdf2020_dis_lbalbuquerque.pdfapplication/pdf735855http://repositorio.ufc.br/bitstream/riufc/56609/3/2020_dis_lbalbuquerque.pdf7f43ea014e95af48da983a161415c57eMD53riufc/566092021-02-19 06:38:19.772oai:repositorio.ufc.br:riufc/56609TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTyBFWENMVVNJVkEgREEgVUZDDQoNCkFvIGNvbmNvcmRhciBlbSBjb25jZWRlciBlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyAocykgYXV0b3IgKGVzKSBvdSBjb3B5cmlnaHQgcHJvcHJpZXTDoXJpbykgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBDZWFyw6EgKFVGQykgbyBkaXJlaXRvIG7Do28gZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsDQp0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSBlIC8gb3UgZGlzdHJpYnVpciBzZXUgdHJhYmFsaG8gKGluY2x1aW5kbw0KbyByZXN1bW8pIGVtIHRvZG8gbyBtdW5kbyBlbSBmb3JtYXRvIGltcHJlc3NvIGUgZWxldHLDtG5pY28gZSBlbSBxdWFscXVlciBtZWlvLA0KaW5jbHVpbmRvLCBtYXMgbsOjbyBzZSBsaW1pdGFuZG8gYSwgw6F1ZGlvIG91IHbDrWRlby4NCg0KVm9jw6ogY29uY29yZGEgcXVlIGEgVUZDIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFkdXppciBvDQpzdWJtaXNzw6NvIGEgcXVhbHF1ZXIgbWVpbyBvdSBmb3JtYXRvIHBhcmEgZmlucyBkZSBwcmVzZXJ2YcOnw6NvLg0KDQpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGQyBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGVzdGUgdHJhYmFsaG8gcGFyYQ0KZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrdXAgZSBwcmVzZXJ2YcOnw6NvLg0KDQpWb2PDqiBkZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gYW8gcmVwb3NpdMOzcmlvIMOpIHNldSB0cmFiYWxobyBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBkaXJlaXRvIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIHNlIHJlc3BvbnNhYmlsaXphIGRlIHF1ZSBvIHNldSB0cmFiYWxobyBuw6NvIGluZnJpbmdlLCBhdMOpIG9uZGUgdm9jw6ogc2FiZSwgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uDQoNClNlIG8gdHJhYmFsaG8gY29udGl2ZXIgbWF0ZXJpYWwgc29icmUgbyBxdWFsIHZvY8OqIG7Do28gcG9zc3VpIGRpcmVpdG9zIGF1dG9yYWlzLA0Kdm9jw6ogZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIHByb3ByaWV0w6FyaW8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYSBVRkMgb3MgZGlyZWl0b3MgZXhpZ2lkb3MgcG9yIGVzdGEgbGljZW7Dp2EsIGUgcXVlIHRhbCBtYXRlcmlhbCBkZSB0ZXJjZWlyb3Mgw6kgY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbw0KZGVudHJvIGRvIHRleHRvIG91IGNvbnRlw7pkbyBkYSBzdWJtaXNzw6NvLg0KDQpTZSBvIHRyYWJhbGhvIGRlcG9zaXRhZG8gZm9yIGJhc2VhZG8gZW0gdHJhYmFsaG8gcGF0cm9jaW5hZG8gb3UgYXBvaWFkbw0KcG9yIHVtYSBhZ8OqbmNpYSBvdSBvcmdhbml6YcOnw6NvIHF1ZSBuw6NvIHNlamEgYSBVRkMsIHZvY8OqIHNlIHJlc3BvbnNhYmlsaXphIHBvcg0KY3VtcHJpciBxdWFscXVlciBkaXJlaXRvIGRlIHJldmlzw6NvIG91IG91dHJhcyBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcG9yIHRhaXMNCmNvbnRyYXRvIG91IGFjb3Jkby4NCg0KQSBVRkMgIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBzZXUgKHMpIG5vbWUgKHMpIGNvbW8gYXV0b3IgKGVzKSBvdSBwcm9wcmlldMOhcmlvIChzKSBkbw0KdHJhYmFsaG8gc3VibWV0aWRvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGV4Y2V0byBjb25mb3JtZSBwZXJtaXRpZG8gcG9yIGVzdGUNCmxpY2Vuw6dhLCBxdWUgZXN0w6Egc2VuZG8gYXF1aSBhcHJlc2VudGFkYS4NCg0KQ29vcmRlbmHDp8OjbyBkbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRkMNCg0KRepositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2021-02-19T09:38:19Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Finding maximum patterns using decision diagrams
title Finding maximum patterns using decision diagrams
spellingShingle Finding maximum patterns using decision diagrams
Albuquerque, Lucas Braga de
Padrão máximo
Análise lógica de dados
Diagrama de decisão
Branch-and-bound
Maximum pattern
Logical analysis of data
Decision diagram
title_short Finding maximum patterns using decision diagrams
title_full Finding maximum patterns using decision diagrams
title_fullStr Finding maximum patterns using decision diagrams
title_full_unstemmed Finding maximum patterns using decision diagrams
title_sort Finding maximum patterns using decision diagrams
author Albuquerque, Lucas Braga de
author_facet Albuquerque, Lucas Braga de
author_role author
dc.contributor.author.fl_str_mv Albuquerque, Lucas Braga de
dc.contributor.advisor1.fl_str_mv Bonates, Tibérius de Oliveira e
contributor_str_mv Bonates, Tibérius de Oliveira e
dc.subject.por.fl_str_mv Padrão máximo
Análise lógica de dados
Diagrama de decisão
Branch-and-bound
Maximum pattern
Logical analysis of data
Decision diagram
topic Padrão máximo
Análise lógica de dados
Diagrama de decisão
Branch-and-bound
Maximum pattern
Logical analysis of data
Decision diagram
description Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.
publishDate 2020
dc.date.issued.fl_str_mv 2020
dc.date.accessioned.fl_str_mv 2021-02-19T09:38:19Z
dc.date.available.fl_str_mv 2021-02-19T09:38:19Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ALBUQUERQUE, Lucas Braga de. Finding maximum patterns using decision diagrams. 2020. 71 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Centro de Ciências, Universidade Federal do Ceará, 2020.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufc.br/handle/riufc/56609
identifier_str_mv ALBUQUERQUE, Lucas Braga de. Finding maximum patterns using decision diagrams. 2020. 71 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Centro de Ciências, Universidade Federal do Ceará, 2020.
url http://www.repositorio.ufc.br/handle/riufc/56609
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/56609/4/license.txt
http://repositorio.ufc.br/bitstream/riufc/56609/3/2020_dis_lbalbuquerque.pdf
bitstream.checksum.fl_str_mv 4d8f4e989fd8622bc24a719aca4d64ce
7f43ea014e95af48da983a161415c57e
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847792409419710464