Multimodal classification for detecting products that do not comply with the Americanas S.A.'s marketplace sales policies

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Romualdo, Alan da Silva
Orientador(a): Caseli, Helena de Medeiros lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/16443
Resumo: Multimodal learning for the e-commerce domain, some classification methods are ne- eded for categorization, information retrieval and product recommendations, which are generally composed of different modalities: images and texts. Due to large diversification in the characteristics of these modalities or the absence/incompleteness of information (for example, incomplete product attributes), classification methods face many difficulties in dealing with this information in order to improve their classification. Thus, this work was carried out to investigate the multimodal learning in visual and textual modalities for e-commerce. Our experiments show good results for classification of products from “Adult” and “Illegal Devices” categories, which is part of the dataset provided by the partner company of this project. In these experiments, training was carried out for the specific modalities, deriving text and image models, as well as the fusion of the two mo- dalities in a multimodal model. The best models were the binary textual models trained taking into account product titles and descriptions: TD bin-adult (with a recall of 98%) and TD bin-illegal (with a recall of 95 %). We have some insights about the multimo- dal classification, mainly for the visual modality which, regarding its nature, could not capture patterns as well as textual models.