On the design of similarity functions for binary data

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Veras, Marcelo Bruno de Almeida
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/66593
Resumo: The binary feature vector is a widely used representation in many areas of knowledge. They serve to indicate the presence or absence of certain characteristics, therefore, functions that make use of these representations, such as similarity functions, are important to recognize how objects are similar to each other and perform tasks, such as classification, clustering and detection of outliers. The similarity function is a measure that quantifies this similarity and directly influences the performance of a proposed solution. Due to its importance, it is fundamental to properly solve a problem that a good similarity function is used. For choosing a similarity function, two approaches are commonly used: one is to search and analyze existing functions that fit the problem better, and the other is to create a new function with a specialist. In this work, both approaches are examined, and a new proposal is made for each approach outlining both the advantages and disadvantages. In the first one we present a methodology to designing similarity functions and a new function to deal of sparse data, as well as evaluating of proposed function through a series of experiments. In the second one, we propose an automated framework that learns from data to generate similarity function that are appropriate to a given task. This framework was developed to generate functions with theoretical properties necessary for a similarity function. Again, a series of experiments are conducted to asses its importance. We evaluated both studies performances in relation to 63 other similarity functions. Based on the results, we can state that in both cases our proposals were able to outperform classical functions in most of the tested cases.