Statistical physics analysis of machine learning models

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Veiga, Rodrigo Soares
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/43/43134/tde-17082022-084404/
Resumo: This thesis presents three main contributions to the understanding of machine learning models by making use of statistical physics tools. First, we investigate the possible relation between the renormalisation group and restricted Boltzmann machines trained with two-dimensional ferromagnetic Ising data, pointing out possible misleadings in preliminary proposals to explicitly construct this bridge. Secondly, we examine the convergence behaviour of stochastic gradient descent in high-dimensional two-layer neural networks. By building up on classic statistical physics approaches and extending them to a broad range of learning rate, time scales, and hidden layer width, we construct a phase diagram describing the various learning scenarios arising in the high-dimensional setting. We also discuss the trade-off between learning rate and hidden layer width, which has been crucial in the recent mean-field theories. Thirdly, we study both Bayes-optimal and empirical risk minimization generalisation errors of the multi-class teacher-student perceptron. We characterise a first-order phase transition arising in the Bayes-optimal performance for Rademacher teacher weights and observe that, for Gaussian teachers, regularised cross-entropy minimisation can yield to close-to-optimal performance.