Multi-task Learning Applied to Computer Vision Problems

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: DIOGO NUNES GONCALVES
Orientador(a): Hemerson Pistori
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufms.br/handle/123456789/4841
Resumo: Deep learning has been widely studied, mainly to solve problems considered complex. In general, these problems can be described and divided into a set of tasks. These tasks are intrinsic to the general problem, that is, they are naturally defined because they are part of the essence of the problem. In addition, they can be learned in isolation but are related to the solution of the general problem. Another important factor is that for a larger computer vision problem, performing the distinct tasks individually becomes expensive in memory and inference time. To solve these problems, several approaches as Multi-task Learning (MTL) was proposed. The idea is to simulate human learning, in which people can learn new tasks through experiences gained in learning similar tasks. This approach allows the learning of the tasks simultaneously, building a relationship between them. From these directions, this work in the form of a collection of articles presents MTL approaches for solving computer vision problems. Initially, two problems were addressed: detection of plantation lines in the first article and detection of fingerlings in the second. In the detection of plantation lines the idea is to divide the problem into identifying the plants individually and detecting the plantation lines. In fingerling detection, the tasks are divided into detecting the fingerling and identifying the fingerling direction in subsequent frames. For both problems, a method was proposed with a backbone that extracts the initial features for all tasks. Taking the initial features as input, independent branches learn the solution of each task. The exchange of information between tasks occurs through the concatenation of features extracted at specific points in each branch. The results showed that sharing between tasks is important for the solution, achieving results superior to the state-of-the-art. In addition to the two proposals, a new semantic segmentation method using MTL and attention mechanism was proposed. The main advance was the use of weights learned by Transformers to indicate the importance of a task to others. Thus, only image regions considered relevant influence other tasks. Results on two problems, plantation line and gaps, and leaf segmentation and defoliation, showed the effectiveness of the approach compared to the state-of-the-art.