A deep learning system to perform multi-instance multi-label event classification in video game footage
Ano de defesa: | 2022 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Uberlândia
Brasil Programa de Pós-graduação em Ciência da Computação |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufu.br/handle/123456789/36957 http://doi.org/10.14393/ufu.di.2022.562 |
Resumo: | Video games, in addition to representing an extremely relevant field of entertainment and market, have been widely used as a case study in artificial intelligence for representing a problem with a high degree of complexity. In such studies, the investigation of approaches that endow player agents with the ability to retrieve relevant information from game scenes stands out, since such information can be very useful to improve their learning ability. This work is divided into two parts, the first proposes and analyses new deep learning-based models to identify game events occurring in Super Mario Bros gameplay footage. These models are composed of a feature extractor convolutional neural network (CNN) and a classifier neural network (NN). The extracting CNN aims to produce a feature-based representation for game scenes and submit it to the classifier so that the latter can identify the game event present in each scene. The main contribution of this first part is to demonstrate the greater performance reached by the models that associate chunk representation of the data with the resources of the classifier recurrent neural networks (RNN). The second part of the study presents two deep learning (DL) models designed to deal with multi-instance multi-labels (MIML) event classification in gameplay footage. The architecture of these models is based on a data generator script, a convolutional neural network (CNN) feature extractor, and a deep classifier neural network. The main contributions of this second part are: 1) implementation of an automatic data generator script to produce the frames from the game footage; 2) construction of a frame-based and a chunk-based pre-processed/balanced datasets to train the models; 3) generating a fine-tuned MobileNetV2, from the standard MobileNetV2, specialized in dealing with gameplay footage; 4) implementation of the DL models to perform MIML event classification in gameplay footage. |