Face image inpainting based on Generative Adversarial Networks

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Ivamoto, Victor Soares
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/100/100131/tde-06022024-184049/
Resumo: Face image inpainting is a challenging problem in computer vision with several practical uses, and is employed in many image preprocessing applications. In the past few years, deep learning has made great breakthroughs in the field of image inpainting. The impressive results achieved by Generative Adversarial Networks in image processing increased the attention of the scientific community in recent years around facial inpainting. Recent network architecture developments are the two-stage networks using coarse to fine approach, landmarks, semantic segmentation map, and edge maps that guide the inpainting process. Moreover, improved convolutions enlarge the receptive field and filter the values passed to the next layer, and attention layers create relationships between local and distant information. The objective of this project is to evaluate and compare the efficacy of the baseline models identified in the literature on face datasets with three occlusion types. To this end, a comparative study was performed among the baseline models to identify the advantages and disadvantages of each model on three types of facial occlusions. A literature review gathered the baseline models, face datasets, occlusions, and evaluation metrics. The baseline models were DF1, DF2, EdgeConnect, GLCI, GMCNN, PConv, and SIIGM. The datasets were CelebA and CelebA-HQ. The occlusions were a square mask centered in the image, an irregular shape mask and a facial mask (MTF). The evaluation metrics were PSNR, SSIM and LPIPS. The comparative study consisted in two experiments, one training the baseline models from scratch, and the other using pretrained models. Both experiments followed the same testing procedure. GMCNN achieved the best overall results with the square mask, DF2 was the best model with the irregular mask, and both models were the best with the MTF mask. PConv results were very bad in all experiments, except for irregular mask with the pretrained model.