Abordagens Multimodais com Fusão de Dados em Aprendizado Profundo

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Lucas de Souza Rodrigues
Orientador(a): Edson Takashi Matsubara
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufms.br/handle/123456789/6357
Resumo: Deep neural networks, especially language and vision models, have been widely used in real problems in recent years. Usually models apply the use of only one type of data/information (text, image, video, audio) in learning problems, also called unimodal models. However, given the growing amount of unstructured information and the variety of existing data formats, new approaches have been developed with the aim of establishing strategies that enable the use of multiple data in the same learning model. This work explores data fusion in Multimodal Machine Learning (ML) models. The proposal of this thesis explores a simple strategy that uses mathematical operations to merge the different types of data between the layers of the multimodal architecture, mechanisms of attention and residual connections. Another proposal explores the use of multimodal knowledge distillation to optimize the performance of deep learning models, transferring knowledge between modalities of the same domain. The main advance of this work was to use arithmetic operations, attention mechanisms and residual connections in multimodal approaches with data fusion. This allowed obtaining complementary representations about the modalities, which led to a better convergence without significant difference with the state-of-the-art.