Finding maximum patterns using decision diagrams

Albuquerque, Lucas Braga de

Finding maximum patterns using decision diagrams

Bibliographic Details
Main Author:	Albuquerque, Lucas Braga de
Publication Date:	2020
Format:	Master thesis
Language:	por
Source:	Repositório Institucional da Universidade Federal do Ceará (UFC)
Download full:	http://www.repositorio.ufc.br/handle/riufc/56609
Summary:	Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.

Item metadata

id	UFC-7_92e12f9bbd58198b63b02e045e9d23f3
oai_identifier_str	oai:repositorio.ufc.br:riufc/56609
network_acronym_str	UFC-7
network_name_str	Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling	Albuquerque, Lucas Braga deBonates, Tibérius de Oliveira e2021-02-19T09:38:19Z2021-02-19T09:38:19Z2020ALBUQUERQUE, Lucas Braga de. Finding maximum patterns using decision diagrams. 2020. 71 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Centro de Ciências, Universidade Federal do Ceará, 2020.http://www.repositorio.ufc.br/handle/riufc/56609Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.Análise Lógica de Dados (ALD) é um algoritmo de classificação supervisionada baseado em regras, o qual é fundamentado em otimização, combinatória, e funções Booleanas. Um conceito central em ALD é o de padrão, o qual resume informação extraída de um dado conjunto de dados. Seja D um conjunto de vetores binários particionado em um conjunto de observações positivas e um conjunto de observações negativas. Um padrão positivo é um subcubo do hipercubo n-dimensional, o qual possui uma interseção não-vazia com a parte positiva de D, e uma interseção vazia com a parte negativa de D. Uma observação é coberta por um padrão se pertence ao subcubo correspondente, e a cobertura de um padrão é o número de observações em D cobertas por ele. O problema do a-padrão positivo máximo consiste em encontrar um padrão cuja cobertura é máxima entre todos aqueles que cobrem uma dada observação positiva a em D, o que corresponde a resolver um problema de cobertura de conjuntos não-linear. Uma generalização do problema, o problema do padrão positivo máximo, busca um padrão positivo cuja cobertura é máxima entre todos os padrões, não apenas aqueles que cobrem uma observação em particular. Revisamos todas as abordagens por programação linear inteira (PLI) para esses dois problemas encontradas na literatura e as avaliamos empiricamente usando um software comercial de PLI. Além disso, introduzimos um modelo de programação dinâmica, uma regra de mescla, e todas as heurísticas necessárias para modelar e resolver os dois problemas utilizando-se de uma metodologia de otimização baseada em diagramas de decisão (DDs), a qual foi desenvolvida recentemente. A metodologia consiste em um algoritmo de branch-and-bound (BAB), no qual DDs fazem o papel tradicional da relaxação linear, assim como o de heurísticas primais. Também discutimos detalhes de implementação relevantes com o intuito de melhorar a performance do BAB baseado em DDs. Por fim, comparamos a performance do nosso resolvedor baseado em DDs com as abordagens de PLI da literatura. Nossos resultados sugerem que uma implementação direta de um BAB baseado em DDs produz, em geral, soluções de mais qualidade do que um software comercial de PLI, dentro de um mesmo limite de tempo.Padrão máximoAnálise lógica de dadosDiagrama de decisãoBranch-and-boundMaximum patternLogical analysis of dataDecision diagramFinding maximum patterns using decision diagramsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81893http://repositorio.ufc.br/bitstream/riufc/56609/4/license.txt4d8f4e989fd8622bc24a719aca4d64ceMD54ORIGINAL2020_dis_lbalbuquerque.pdf2020_dis_lbalbuquerque.pdfapplication/pdf735855http://repositorio.ufc.br/bitstream/riufc/56609/3/2020_dis_lbalbuquerque.pdf7f43ea014e95af48da983a161415c57eMD53riufc/566092021-02-19 06:38:19.772oai:repositorio.ufc.br:riufc/56609TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTyBFWENMVVNJVkEgREEgVUZDDQoNCkFvIGNvbmNvcmRhciBlbSBjb25jZWRlciBlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyAocykgYXV0b3IgKGVzKSBvdSBjb3B5cmlnaHQgcHJvcHJpZXTDoXJpbykgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBDZWFyw6EgKFVGQykgbyBkaXJlaXRvIG7Do28gZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsDQp0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSBlIC8gb3UgZGlzdHJpYnVpciBzZXUgdHJhYmFsaG8gKGluY2x1aW5kbw0KbyByZXN1bW8pIGVtIHRvZG8gbyBtdW5kbyBlbSBmb3JtYXRvIGltcHJlc3NvIGUgZWxldHLDtG5pY28gZSBlbSBxdWFscXVlciBtZWlvLA0KaW5jbHVpbmRvLCBtYXMgbsOjbyBzZSBsaW1pdGFuZG8gYSwgw6F1ZGlvIG91IHbDrWRlby4NCg0KVm9jw6ogY29uY29yZGEgcXVlIGEgVUZDIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFkdXppciBvDQpzdWJtaXNzw6NvIGEgcXVhbHF1ZXIgbWVpbyBvdSBmb3JtYXRvIHBhcmEgZmlucyBkZSBwcmVzZXJ2YcOnw6NvLg0KDQpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGQyBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGVzdGUgdHJhYmFsaG8gcGFyYQ0KZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrdXAgZSBwcmVzZXJ2YcOnw6NvLg0KDQpWb2PDqiBkZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gYW8gcmVwb3NpdMOzcmlvIMOpIHNldSB0cmFiYWxobyBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBkaXJlaXRvIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIHNlIHJlc3BvbnNhYmlsaXphIGRlIHF1ZSBvIHNldSB0cmFiYWxobyBuw6NvIGluZnJpbmdlLCBhdMOpIG9uZGUgdm9jw6ogc2FiZSwgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uDQoNClNlIG8gdHJhYmFsaG8gY29udGl2ZXIgbWF0ZXJpYWwgc29icmUgbyBxdWFsIHZvY8OqIG7Do28gcG9zc3VpIGRpcmVpdG9zIGF1dG9yYWlzLA0Kdm9jw6ogZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIHByb3ByaWV0w6FyaW8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYSBVRkMgb3MgZGlyZWl0b3MgZXhpZ2lkb3MgcG9yIGVzdGEgbGljZW7Dp2EsIGUgcXVlIHRhbCBtYXRlcmlhbCBkZSB0ZXJjZWlyb3Mgw6kgY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbw0KZGVudHJvIGRvIHRleHRvIG91IGNvbnRlw7pkbyBkYSBzdWJtaXNzw6NvLg0KDQpTZSBvIHRyYWJhbGhvIGRlcG9zaXRhZG8gZm9yIGJhc2VhZG8gZW0gdHJhYmFsaG8gcGF0cm9jaW5hZG8gb3UgYXBvaWFkbw0KcG9yIHVtYSBhZ8OqbmNpYSBvdSBvcmdhbml6YcOnw6NvIHF1ZSBuw6NvIHNlamEgYSBVRkMsIHZvY8OqIHNlIHJlc3BvbnNhYmlsaXphIHBvcg0KY3VtcHJpciBxdWFscXVlciBkaXJlaXRvIGRlIHJldmlzw6NvIG91IG91dHJhcyBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcG9yIHRhaXMNCmNvbnRyYXRvIG91IGFjb3Jkby4NCg0KQSBVRkMgIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBzZXUgKHMpIG5vbWUgKHMpIGNvbW8gYXV0b3IgKGVzKSBvdSBwcm9wcmlldMOhcmlvIChzKSBkbw0KdHJhYmFsaG8gc3VibWV0aWRvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGV4Y2V0byBjb25mb3JtZSBwZXJtaXRpZG8gcG9yIGVzdGUNCmxpY2Vuw6dhLCBxdWUgZXN0w6Egc2VuZG8gYXF1aSBhcHJlc2VudGFkYS4NCg0KQ29vcmRlbmHDp8OjbyBkbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRkMNCg0KRepositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br \|\| repositorio@ufc.bropendoar:2021-02-19T09:38:19Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv	Finding maximum patterns using decision diagrams
title	Finding maximum patterns using decision diagrams
spellingShingle	Finding maximum patterns using decision diagrams Albuquerque, Lucas Braga de Padrão máximo Análise lógica de dados Diagrama de decisão Branch-and-bound Maximum pattern Logical analysis of data Decision diagram
title_short	Finding maximum patterns using decision diagrams
title_full	Finding maximum patterns using decision diagrams
title_fullStr	Finding maximum patterns using decision diagrams
title_full_unstemmed	Finding maximum patterns using decision diagrams
title_sort	Finding maximum patterns using decision diagrams
author	Albuquerque, Lucas Braga de
author_facet	Albuquerque, Lucas Braga de
author_role	author
dc.contributor.author.fl_str_mv	Albuquerque, Lucas Braga de
dc.contributor.advisor1.fl_str_mv	Bonates, Tibérius de Oliveira e
contributor_str_mv	Bonates, Tibérius de Oliveira e
dc.subject.por.fl_str_mv	Padrão máximo Análise lógica de dados Diagrama de decisão Branch-and-bound Maximum pattern Logical analysis of data Decision diagram
topic	Padrão máximo Análise lógica de dados Diagrama de decisão Branch-and-bound Maximum pattern Logical analysis of data Decision diagram
description	Logical Analysis of Data (LAD) is a rule-based algorithm for supervised classification that is based on optimization, combinatorics, and Boolean functions. A central concept in LAD is that of a pattern, which summarizes knowledge extracted from a given dataset. Let D be a set of binary vectors partitioned into a set of positive and a set of negative observations. A positive pattern is a subcube of the n-dimensional hypercube having a nonempty intersection with the positive part of D, and an empty intersection with the negative part of D. An observation is covered by a pattern if it belongs to the corresponding subcube, and the coverage of a pattern is the number of observations in D covered by it. The maximum positive a-pattern problem consists in finding a positive pattern whose coverage is maximum among those that cover a given positive observation a in D, which amounts to solving a nonlinear set covering problem. A generalization of it, the maximum positive pattern problem, asks for a positive pattern of maximum coverage among all patterns, not only among those covering a particular observation. We review all integer linear programming (ILP) approaches from the literature for these two problems and empirically evaluate them using a commercial ILP software. Furthermore, we introduce a dynamic programming model, a merging rule, and all necessary heuristics in order to model and solve the two problems using a recently-developed optimization methodology based on decision diagrams (DDs). The methodology consists of a branch-and-bound (BAB) algorithm, in which DDs play the traditional role of the linear programming relaxation, as well as that of primal heuristics. We also discuss relevant implementation details in order to enhance the performance of the DD-based BAB. Lastly, we compare the performance of our DD-based solver with that of the ILP approaches from the literature. Our results indicate that a straightforward DD-based branch-and-bound implementation typically produces higher quality solutions than a commercial MILP software within a common time limit.
publishDate	2020
dc.date.issued.fl_str_mv	2020
dc.date.accessioned.fl_str_mv	2021-02-19T09:38:19Z
dc.date.available.fl_str_mv	2021-02-19T09:38:19Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	ALBUQUERQUE, Lucas Braga de. Finding maximum patterns using decision diagrams. 2020. 71 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Centro de Ciências, Universidade Federal do Ceará, 2020.
dc.identifier.uri.fl_str_mv	http://www.repositorio.ufc.br/handle/riufc/56609
identifier_str_mv	ALBUQUERQUE, Lucas Braga de. Finding maximum patterns using decision diagrams. 2020. 71 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Centro de Ciências, Universidade Federal do Ceará, 2020.
url	http://www.repositorio.ufc.br/handle/riufc/56609
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC
instname_str	Universidade Federal do Ceará (UFC)
instacron_str	UFC
institution	UFC
reponame_str	Repositório Institucional da Universidade Federal do Ceará (UFC)
collection	Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv	http://repositorio.ufc.br/bitstream/riufc/56609/4/license.txt http://repositorio.ufc.br/bitstream/riufc/56609/3/2020_dis_lbalbuquerque.pdf
bitstream.checksum.fl_str_mv	4d8f4e989fd8622bc24a719aca4d64ce 7f43ea014e95af48da983a161415c57e
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv	bu@ufc.br \|\| repositorio@ufc.br
_version_	1847792409419710464

Finding maximum patterns using decision diagrams

Similar Items