Dynamic ensemble of classifiers and security relevant methods of android’s API : an empirical study
Ano de defesa: | 2022 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso embargado |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Pernambuco
UFPE Brasil Programa de Pos Graduacao em Ciencia da Computacao |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufpe.br/handle/123456789/45398 |
Resumo: | The Android operating system provides functions and methods to handle sensitive data to se- cure users’ data. Sensitive data is every data that can identify the user, such as GPS location, biometric data, and banking data. The Android security literature proposes extracting binary features from a method and classifying the method into one of the Security Relevant Method’s classes, adding information about how the method handles sensitive data. However, there is a gap in the literature where Dynamic Ensemble algorithms are not evaluated. Dynamic En- semble techniques are state of the art on Multiple Classifiers Systems, which do not explicitly address the problem of a dataset of binary features. Thus, this work tackles the gap related to Dynamic Ensemble applied to Security Relevant Methods classification. Our analyzes show that, unlikely initially stated in the literature, SVM is not the best classifier for this problem, being MLP, Random Forest, Gradient Boosted Decision Trees, and META-DES using Random Forest as pool generation gives the best results. We also find that, in general, Dynamic En- semble algorithms have a disadvantage compared to monolithic classifiers. Furthermore, this disadvantage is exacerbated in algorithms that use distance-based classifiers, such as OLP. When using the Triplet Loss embedding algorithm, we observed an increase in performance for kNN and OLP, but not for other Dynamic Ensemble techniques, showing that a set of binary features has a more significant impact on these algorithms. |