Robust image features creation by learning how to merge visual and semantic attributes

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Resende, Damares Crystina Oliveira de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-17032021-122717/
Resumo: There are known advantages of using semantic attributes to improve image representation. However, studying how to use such attributes to improve visual subspaces and its effects on coarse and fine-grained classification were still to be investigated. This research report a Visual-Semantic Encoder (VSE) built from a neural network undercomplete autoencoder, that combines visual features and semantic attributes to form a compact subspace containing each domains most relevant properties. It is observed empirically that a learned latent space can better represent image features and even allow one to interpret results in the light of the nature of semantic attributes, offering a path for explainable learning. Experiments were performed in four benchmark datasets where VSE was compared against state-of-the-art algorithms for dimensionality reduction. The algorithm shows to be robust for up to 20% degradation of semantic attributes and is as efficient as LLE for learning a low-dimensional feature space with rich class representativeness, offering possibilities for future work on the deployment of an automatic gathering of semantic data to improve representations. Additionally, the study suggests experimentally that adding high-level concepts to image representations adds linearity to the feature space, allowing PCA to perform well in combining visual and semantic features for enhancing class separability. At last, experiments were performed for zero-shot learning, where VSE and PCA outperform SAE, the state-of-the-art algorithm proposed by Kodirov, Xiang and Gong (2017), and JDL, the joint discriminative learning framework proposed by Zhang and Saligrama (2016), which demonstrates the viability of merging semantic and visual data at both training and test time for learning aspects that transcend class boundaries that allow the classification of unseen data.