Visual analytics for machine learning - computing and leveraging decision boundary maps

Rodrigues, Francisco Caio Maia

Visual analytics for machine learning - computing and leveraging decision boundary maps

Detalhes bibliográficos
Ano de defesa:	2020
Autor(a) principal:	Rodrigues, Francisco Caio Maia
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Aprendizado de máquina Dimensionality reduction Machine learning Redução de dimensionalidade Visual analytics Visualização de dados
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-27112020-071803/
Resumo:	Machine learning classifiers construct decision boundaries that partition data space into a set of regions to which labels are assigned. Understanding these decision boundaries can notably help the actual practical usage of such classifiers (by answering questions such as showing how a certain model is expected to behave on an empty region), as well as give insights on how to improve the training of a given model (by answering questions such as telling where should more training data be provided). In this thesis we propose and explore visual analytics methods for the explicit creation, construction, and use of decision zones of machine learning classifiers. Current methods employed to visualize how a classifier behaves on a dataset mainly use color-coded sample scatterplots, which do not explicitly show the actual decision boundaries or confusion zones. We propose an image-based technique to improve such visualizations. The method samples the 2D space of a projection and color-codes relevant classifier outputs, such as the majority class label, the confusion, and the sample density, to create a dense visual depiction of the high-dimensional decision boundaries. Our technique is simple to implement, handles any classifier, and has only two simple-to-control free parameters. We demonstrate our proposal on several real-world high-dimensional datasets, classifiers, direct and inverse projection techniques. To our knowledge, our work is the first that can create such explicit depictions of decision boundaries and decision zones for any dataset and any classifier, without explicit knowledge of the classifier\'s internals. Based on these visual depictions of decision boundaries, we developed a visual analytics workflow and associated tooling that allows users to perform two common techniques in machine learning - data augmentation and interactive labeling of unseen samples. We show that our approach can be used to perform guided data augmentation in order to shape the decision boundaries learned by a classifier according to the user\'s input. For interactive labeling, we show that our proposed visual depiction of decision boundaries helps in producing improved labeling in an active learning scenario.

Visual analytics for machine learning - computing and leveraging decision boundary maps

Registros relacionados