Efficient and multilingual text-to image synthesis : exploring novel architectures and cross-language strategies

Souza, Douglas Matos de

Efficient and multilingual text-to image synthesis : exploring novel architectures and cross-language strategies

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Souza, Douglas Matos de
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Pontif?cia Universidade Cat?lica do Rio Grande do Sul Escola Polit?cnica Brasil PUCRS Programa de P?s-Gradua??o em Ci?ncia da Computa??o
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Cross-Language Text-To-Image Synthesis Generative Adversarial Networks Generative Models Deep Neural Networks CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
Link de acesso:	https://tede2.pucrs.br/tede2/handle/tede/11642
Resumo:	Text-to-image synthesis is the task of generating images from text descriptions. Given a textual description, a text-to-image algorithm can generate multiple novel images that contain the details described in the text. Text-to-image algorithms are appealing for various real-world tasks. With such algorithms, machines can draw truly novel images that can be used for content generation or assisted drawing, for example. The general framework of text-to-image approaches can be divided into two main parts: i) a text encoder and ii) a generative model for images, which learns a conditional distribution over encoded text. Currently, text-to-image approaches leverage multiple neural networks to overcome the challenges of learning a generative model over images, increasing the overall framework?s complexity as well as the required computation for generating high-resolution images. Additionally, no works so far have explored cross-language models in the context of text-to-image generation, limiting current approaches to supporting only English. This limitation has a significant downside as it restricts access to the technology to users familiar with the English language, leaving out a substantial number of people who could benefit. In this thesis, we make the following contributions to address each of the aforementioned gaps. First, we propose a new end-to-end text-to-image approach that relies on a single neural network for the image generator model, reducing complexity and computation. Second, we propose a new loss function that improves training and yields more accurate models. Finally, we study how text encoders affect the overall performance of text-to-image generation and propose a novel cross-language approach to extend models to support multiple languages simultaneously.

Efficient and multilingual text-to image synthesis : exploring novel architectures and cross-language strategies

Registros relacionados