Optimum-path forest in support of collaborative filtering

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Martins, Guilherme Brandão
Orientador(a): Papa, João Paulo lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/19885
Resumo: Machine learning algorithms are being applied in various computational challenges, among which Recommender Systems (RS) present a range of techniques and approaches to effectively manage large volumes of data and provide personalized and relevant content to users. Such systems must be able to handle data-related issues such as sparsity, scalability, and the cold start problem and Collaborative Filtering (CF) has traditionally been the primary strategy for addressing those challenges. One way to tackled those problems and improve recommendation results is by leveraging auxiliary information sources to compensate the lack of CF data, such as user-item interactions. However, different interpretations of the mentioned problems should be explored. The current work contributes in the field of machine learning by proposing approaches to address the mentioned challenges. This thesis presents a collection of works developed by the author throughout the research period, which have been published or submitted up to the present, encompassing: (i) a systematic literature review which analyzes and discuss recent deep learning approaches employed for CF under sparse-related conditions, while also identifying the challenges and limitations within the field; (ii) a Matrix Factorization (MF)-based ap- proach that leverages CF-related sparsity for the purpose of classifiers fusion; (iii) an alternative unsupervised Optimum-Path Forest (OPF) designed to perform efficiently in large-scale datasets by employing k-approximate-nearest-neighbors graph as its adjacency relation; and (iv) an OPF clustering model built upon the shared-neighborhood concept to alleviate sparsity and high dimensionality issues during CF-based recommendation. The experimental results achieved through such works corroborate the hypotheses of the present thesis.