O Problema do Mapeamento de Sequências em Grafos de De Bruijn

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: LUCAS BARBOSA ROCHA
Orientador(a): Said Sadique Adi
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufms.br/handle/123456789/9222
Resumo: A relevant problem in Computational Biology consists of the task of mapping one sequence onto another for comparison purposes. Typically, this process utilizes a high-quality reference sequence constructed from a specific set of sequences. However, the limitation of this approach is evident, as the reference sequence tends to be biased, representing only a restricted set of sequences and being incapable of encompassing all possibilities. To mitigate this bias, a good strategy is to represent multiple sequences through more robust structures, such as sequence graphs or De Bruijn graphs, and map sequences onto these graphs. The sequence graph is a graph where each vertex is labeled with one or more characters. In the De Bruijn graph of order k each vertex is labeled with a distinct sequence of length k, and there is an edge from one vertex to another if and only if there exists a length k-1 overlap of the suffix of the first vertex with the prefix of the second vertex. Given as input a sequence s and a sequence (or De Bruijn) graph G, mapping s onto G consists of finding a path p in G such that the induced sequence s' by p is as similar as possible to s. This definition gives rise to the problems addressed in this thesis, namely the Sequence Mapping onto Sequence Graphs problem -- PMSG and the Sequence Mapping onto De Bruijn Graphs problem -- PMSB. Both problems admit three variants: 1) changes only in the sequence, 2) changes in the graph, and 3) changes in both the sequence and the graph. In this work, we present an in-depth analysis of PMSB. For variant 1, we implement and evaluate exact algorithms that solve it. Furthermore, we propose heuristics for PMSB and conduct comparative tests between the exact algorithms, our heuristics, and those found in the literature. Additionally, we perform a study demonstrating that it is possible to convert a De Bruijn graph into a simple sequence graph, such that all sequences from the De Bruijn graph are also induced in the simple sequence graph. As for variant 2, we address the problem by considering the ability to induce new edges when a k-mer is modified in the De Bruijn graph. This approach makes the problem easier, allowing us to present a novel exact polynomial solution for this variant.