Classificação de gêneros musicais utilizando convolutional neural network e data augmentation

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Aguiar, Rafael de Lima
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Estadual de Maringá
Brasil
Departamento de Informática
Programa de Pós-Graduação em Ciência da Computação
UEM
Maringá, PR
Centro de Tecnologia
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.uem.br:8080/jspui/handle/1/2501
Resumo: In thiswork, wepresent a master dissertation addressing automatic music genre classification as a patter recognitiontask. The content of the music pieces were described using features obtained in the visual domain, by using spectrograms created from the audio signal. This kind of image has been successfully used in thistasksince 2011 by exploring the main visual attribute that can be found in this kind of image (i.e. texture). In this work, the patterns were described by using representation learning. For this, convolutional neural networks (CNN) were used. CNN is a deep learning architecture and it has been widely used in the literature of pattern recognition. Deep learning is inspired in the human brain and CNNs in the mammal visual system. Overfitting is a recurrent problem when a classification problem is addressed by using CNN, it may occur due to the combination of lacking of training samples and a high dimensionality space. To address this problem we propose to explore data augmentation techniques. In this application domain, examples of data augmentation techniques are: cropping spectrogram images, changing the pitch of a music piece and separating harmonic and percussive components of the sound. Such procedures are implemented in both training and testing sets. In this work we present results obtained with The Latin Music Database and the best accuracy we acquired is close to the state of the art and outcome the best system we known based only in CNN.