A new android malware detection method based on multimodal deep learning and hybrid analysis

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Oliveira, Angelo Schranko de lattes
Orientador(a): Sassi, Renato José lattes
Banca de defesa: Sassi, Renato José lattes, Lopes, Fábio Silva lattes, Silva, Leandro Augusto da lattes, Dias, Cleber Gustavo lattes, Martins, Fellipe Silva lattes
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Nove de Julho
Programa de Pós-Graduação: Programa de Pós-Graduação em Informática e Gestão do Conhecimento
Departamento: Informática
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://bibliotecatede.uninove.br/handle/tede/3096
Resumo: In the current world, whereby almost everything is digitized, cybercrime is on the rise as criminals continue to develop new ways to hack information systems. One of main tools used for cybercrime operations are malware, or malicious software. Malware detection is a challenging problem that has been actively explored by both the industry and academia using intelligent methods. On the one hand, traditional Machine Learning (ML) malware detection methods rely on manual feature engineering that requires expert knowledge. On the other hand, Deep Learning (DL) malware detection methods perform automatic feature learning but usually require much more data and processing power. Moreover, there are multiple data modalities of Malware Analysis (MA) data that can be used for detection purposes. Thus, the general objective of this dissertation was to develop and evaluate a new Android malware detection method, named Chimera, based on Multimodal Deep Learning (MDL) and Hybrid Analysis (HA), using different data modalities and combining both manual and automatic feature engineering in order to increase Android malware detection rate. To train, optimize, and evaluate the models, the Knowledge Discovery in Databases (KDD) process was implemented using a new dataset based on the publicly available Android benchmark dataset Omnidroid containing Static Analysis (SA) and Dynamic Analysis (DA) data extracted from 22000 real malware and goodware samples. By leveraging a hybrid source of information to learn high-level feature representations for both the static and dynamic properties of Android applications, Chimera’s performance outperformed its unimodal DL subnetworks, classical ML methods, and Ensemble ML methods, thus, the results of this dissertation show that the right combination of multimodal data, specialized DL methods, manual and automatic feature engineering can significantly increase Android malware detection rate.