Hybrid and semi-supervised predictive bi-clustering trees for interaction prediction

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Alves, André Hallwas Ribeiro
Orientador(a): Cerri, Ricardo lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/ufscar/18570
Resumo: Interaction data is obtained by observing and recording interactions between objects. The use of interaction data makes it possible to solve many complex problems. Currently, there are several ways to use this data to produce solutions. One of them is to predict new interactions based on existing interactions. Machine learning methods can be used to accomplish this task. The study of interactions between objects is essential in several areas of knowledge, such as recommendation systems, analysis of interactions in social networks, the pharmaceutical industry, and bioinformatics. Machine learning is a subarea of artificial intelligence where algorithms are developed with the ability to learn and execute automatically through training and exploration of a previously provided dataset. In this work, we developed two methods based on Predictive Bi-Clustering Trees (PBCTs) for the prediction of interactions in interactions datasets related to the areas of medicine and bioinformatics. We highlight that global approach-based multi-label methods such as PBCTs, can learn and predict all interactions of an object in a single task and explore the relationships between object spaces. Initially, we build a hybrid learning model between PBCT and Extreme Gradient Boosting (XGBoost), wherein the first stage PBCT is used to generate partitions in an interaction matrix. In the second stage, an XGBoost learning model is induced in each partition to reduce the imbalance between positive and negative interactions in outcome predictions. Data from positive interactions indicate the occurrence of an interaction, and data from negative interactions indicate that it did not occur, while data from unknown (unlabeled) interactions indicate cases where there is no information about interactions. To take advantage of the unlabeled data in the PBCT induction procedure, we propose a semi-supervised adaptation in the split function of the PBCT, thus being able to work with labeled and unlabeled data, with different levels of supervision and imbalance. Both introduced methods had their performance evaluated based on evaluation criteria that considered efficiency in predicting interactions and computational performance and through a comparative study with the original PBCT. As a result, both produced methods showed promising through an experimental procedure, presenting contributions to the literature and paving the way for the advancement of the state-of-the-art.