Development of estimation of distribution algorithms for linear genetic programming

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Sotto, Leo Francoso Dal Piccol [UNIFESP]
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de São Paulo (UNIFESP)
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=10889466
https://hdl.handle.net/11600/64929
Resumo: Linear Genetic Programming (LGP) is a Genetic Programming (GP) variant that has been successfully applied in various domains, such as regression, classification, and navigation. Differently from traditional GP, that represents programs as trees, LGP uses lists of instructions, which causes the data flow to be represented as a Directed Acyclic Graph (DAG) and introduces features as, for example, non-effective code and code reuse. As in other Evolutionary Algorithms (EAs), LGP’s stochastic search process neither has the knowledge to produce good quality solutions nor is it able to avoid poor quality programs, which reduces its efficacy. Furthermore, their recombination operators often ignore the correlation between the different positions in the genotype. To deal with these issues in EAs, researchers proposed the Estimation of Distribution Algorithms (EDAs), that use a probability model to sample promising solutions instead of applying recombination operators. The first goal of this PhD thesis is to propose EDAs for LGP that can make use of the LGP representation features and model dependencies between variables. Two forms of doing that are explored: 1) Adapting a Stochastic Context-free Grammar (SCFG) to sample sequences of instructions instead of derivation trees and integrating it in LGP via hybrid versions that combine the use of the grammar with the application of traditional LGP genetic operators; 2) Creating an intermediary integer vector that represents a sequence of instructions and using it to build a Bayesian Network. The resulting techniques are validated on regression and classification problems, and can outperform LGP when the hybrid version is considered. The thesis also address challenges in designing EDAs for the LGP representation. Given that the LGP representation features play an important role in how new methods should be designed, research was also conducted on the role of non-effective instructions in LGP and the impact of the DAG representation compared to the standard tree representation, in order to better understand how the technique works and thus to improve the design of new methods based on it. The conclusions are that non-effective instructions are an important component of LGP programs, although its benefits are dependent on how the algorithm is used, and it is also shown that DAGs present a great advantage over trees for solving determined classes of problems, specially design of digital circuits.