Modelos Lomax assimétricos: uma nova abordagem para a classificação de dados binários desbalanceados
Ano de defesa: | 2023 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
Programa de Pós-Graduação: |
Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/18245 |
Resumo: | Imbalanced data refers to a dataset where one class has significantly fewer observations than the other class. This can lead to poor performance of both machine learning algorithms and statistical models, since most of these tools assume that the data has the same proportion of observations in both categories. To deal with this challenge, several authors suggest the use of asymmetric link functions in binary regression, instead of the well-known symmetric link functions: logit and probit. Thus, it is possible not only to improve the predictive performance of the model, but also to reduce the bias in the estimation of parameters and probabilities. This is a solution that generates probabilistic models, which excel in decision-making compared to those that simply assign a single class without considering the associated probability. Therefore, this work aims to present new asymmetric link functions generated from the transformations of the Lomax distribution. These functions include the Double Lomax (DLomax), Power Double Lomax (PDLomax), and Reverse Power Double Lomax (RPDLomax) distributions. The proposed functions have proven asymmetry and can be easily implemented in statistical softwares. In addition, the simulation study indicates that these functions can perform better than logistic regression in various imbalanced classification scenarios. They also proved to be promising in modeling real-world datasets, as in this work we obtained better results than classic link functions in two applications. |