Streaming, distributed, and asynchronous amortized inference

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Henrique, Tiago da Silva
Orientador(a): Mesquita, Diego
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Link de acesso: https://hdl.handle.net/10438/36338
Resumo: We address the problem of sampling from an unnormalized distribution defined in a compositional space, i.e., a continuous or discrete set whose elements can be sequentially constructed from an initial state through the application of simple actions. This definition accommodates the space of (directed acyclic) graphs, natural language sentences of bounded size, and Euclidean n-spaces, among others, and is at the core of many applications in (Bayesian) statistics and machine learning. In particular, we focus on Generative Flow Networks (GFlowNets), a family of amortized samplers which cast the problem of sampling as finding a flow assignment in a flow network such that the total flow reaching a sink node equals that node's unnormalized probability. Despite their remarkable success in drug discovery, structure learning, and natural language processing, important questions regarding the scalability, generalization, and limitations of these models remain largely underexplored by the literature. In view of this, this thesis contributes with both methodological and theoretical advances for a better usability and understanding of GFlowNets. From a computational perspective, we design novel algorithms for the non-localized training of GFlowNets. This enables learning these models in a streaming and distributed fashion, which is crucial for managing ever-increasing data sizes and exploiting the architecture of modern computer clusters. The central idea of our methods is to break up the flow assignment problem into easier subproblems solved by separately trained GFlowNets. Once trained, these models are aggregated by a global GFlowNet. To do so efficiently, we also revisit the relationship between GFlowNets and variational inference and devise low-variance estimators for their learning objective's gradients to achieve faster training convergence. Overall, our experiments show that our non-localized procedures often lead to better approximations in a shorter time relatively to a centralized monolithic GFlowNet. Additionally, we demonstrate that the models corresponding to the global minimizers of the proposed surrogate learning objectives sample in proportion to the unnormalized target. This fact raises the questions of when a GFlowNet can reach such a global minimum and how close a trained model is to it. Towards answering them, we first present a family of discrete distributions that cannot be approximated by a GFlowNet when the flow functions are parameterized by 1-WL graph neural networks. Then, we develop a computationally amenable metric to probe the distributional accuracy of GFlowNets. Finally, as GFlowNets rely exclusively on a subgraph of the (potentially huge) flow network to learn a flow assignment, we argue that generalization plays a critical role in their success and derive the first non-vacuous (PAC-Bayesian) statistical guarantees for these models.