Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
Silva, Pedro de Carvalho Braga Ilidio |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
https://www.teses.usp.br/teses/disponiveis/76/76133/tde-01072024-082057/
|
Resumo: |
The present study investigates decision forest algorithms for predicting interactions in bipartite networks.We concentrate on examples of such problems in the biological domain, such as drugprotein interactions, microRNA-gene interactions or long non-coding RNA-protein interactions. Notwithstanding, the proposed methods encompass the broad range of tasks satisfying i) the goal is to predict interactions between two entities; ii) the interacting pairs are composed of two different types of entities; and iii) each type of entity has its own set of input features. We refer to this paradigm as bipartite interaction learning or bipartite learning. Predicting interactions in such networks has fundamental challenges. For instance, the number of possible interactions is often very large in comparison to the number of known interactions. As a result, the data is frequently sparse, and negative annotations are unreliable. We explore a class of decision forest models specifically designed to address these challenges, that we broadly call bipartite forests. First, we demonstrate how these trees can be adapted to yield a log n speedup in training time. We also propose using weighted-neighbors approaches to determine each leafs output, which resulted in improved generalization. Finally, we introduce semi-supervised impurity functions to bipartite forests. These functions result in trees that also consider clusters of instances in the feature space, rather than only their labels. This is shown to improve the forests resilience to the missing annotations. Our models display highly-competitive performance across ten interaction prediction datasets.We believe the proposed methods can be a crucial step in developing effective and scalable machine learning models for interaction prediction. Further adaptations of these models could also impact other domains, such as recommendation systems, multilabel learning and weak-label learning. |