Dimensionality reduction-based metric learning using information theoretic measures

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Cervati Neto, Alaor
Orientador(a): Levada, Alexandre Luis Magalhães lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/20097
Resumo: Processing large amounts of data to extract useful information is one of the main issues that may be approached using machine learning. One way to obtain this information is by grouping data according to their common features. In very complex data sets, this task may be accomplished by finding simpler ways of representing the relations between this data, lowering their dimensionality. There are many methods to find the data groups in a set automatically. However, finding a way to minimise the complexity of this data without losing relevant content is a computationally costly process. An alternative to that is treating these data sets as probability distributions of random variables and using concepts and measures from information theory to find their relations more efficiently. This work describes some dimensionality reduction methods and information theory measures and proposes that they be joined in order to obtain better results, by creating variants more resistant to disruption in data or differences in set sizes. The adaptation of existing methods to include information theory-based measures is tested on real datasets, and results formally verified as to their adequacy for obtaining more accurate metrics. In most of the tested cases, results show a better performance compared to traditional classification methods, while in others the modifications made those more effective for datasets with fewer samples.