Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Lucas de Souza Rodrigues |
Orientador(a): |
Edson Takashi Matsubara |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Fundação Universidade Federal de Mato Grosso do Sul
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Link de acesso: |
https://repositorio.ufms.br/handle/123456789/6357
|
Resumo: |
Deep neural networks, especially language and vision models, have been widely used in real problems in recent years. Usually models apply the use of only one type of data/information (text, image, video, audio) in learning problems, also called unimodal models. However, given the growing amount of unstructured information and the variety of existing data formats, new approaches have been developed with the aim of establishing strategies that enable the use of multiple data in the same learning model. This work explores data fusion in Multimodal Machine Learning (ML) models. The proposal of this thesis explores a simple strategy that uses mathematical operations to merge the different types of data between the layers of the multimodal architecture, mechanisms of attention and residual connections. Another proposal explores the use of multimodal knowledge distillation to optimize the performance of deep learning models, transferring knowledge between modalities of the same domain. The main advance of this work was to use arithmetic operations, attention mechanisms and residual connections in multimodal approaches with data fusion. This allowed obtaining complementary representations about the modalities, which led to a better convergence without significant difference with the state-of-the-art. |