Using a fairness-utility trade-off metric to systematically benchmark non-generative fair adversarial learning strategies

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Lima, Luiz Fernando Fonsêca Pinheiro de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/26323
Resumo: Artificial intelligence systems for decision-making have become increasingly popular in several areas. However, it is possible to identify biased decisions in many applications, which have become a concern for the computer science, artificial intelligence, and law communities. Therefore, researches are proposing solutions to mitigate bias and discrimination in decision-makers. Some explored strategies are based on generative adversarial networks to generate fair data. Others are based on adversarial learning to achieve fairness in machine learning by encoding fairness constraints through an adversarial model. Moreover, it is usual for each proposal to assess its model with a specific metric, making the comparison of current approaches a complex task. Therefore, this work proposes a benchmark procedure with a systematical method to assess the fair machine learning models. In this sense, we define the FU-score metric to evaluate the utility-fairness trade-off, the utility and fairness metrics to compose this assessment, the used dataset and applied data preparation, and the statistical test. We also performed this benchmark evaluation for the non-generative adversarial models, analyzing the literature models from the same metric perspective. This assessment could not indicate a single model which better performs for all datasets. However, we built an understanding of how each model performs on each dataset which statistical confidence.