An investigative analysis of obvious and non-obvious Bias in judicial data using supervised and unsupervised machine learning techniques

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Silva, Bruno dos Santos Fernandes da
Orientador(a): Abreu, Marjory Cristiany da Costa
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Rio Grande do Norte
Programa de Pós-Graduação: PROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃO
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufrn.br/handle/123456789/33299
Resumo: Brazilian Courts have been working in virtualisation of judicial processes since this century’s rise and, since then, a massive volume of data has been produced. Computational techniques have been an intimate ally to face the increasing amount of accumulated and new lawsuits in the system. However, although there is a misunderstanding that automation solutions are always ’intelligent’, which in most cases, it is not valid, there has never been any discussion about the use of intelligent solutions for this end as well as any issues related to automatic predicting and decision making using historical data in context. One of the problems that have already come to light is the bias in judicial data sets worldwide. This work aims to analyse a judicial dataset looking for decision bias and intelligent algorithms suitability. Taking motivation from the social impact of bias in the decision-making process, we have selected gender and social condition of indicted as classes for investigation. We have used a dataset of judicial sentences (built by Além da Pena research group), identified data structure and distribution, created supervised and unsupervised machine learning models applied to the dataset and analysed the occurrence of obvious and non-obvious bias related to judicial decisions. To investigate obvious bias, classification techniques based on k-Nearest Neighbours, Naive Bayes and Decision Trees algorithms, and to non-obvious bias, the unsupervised algorithms like k-Means and Hierarchical Clustering. Our experiments have been conducted to results that do not achieve a conclusive detection of bias but suggest a trend that would confirm its occurrence in the dataset, and therefore, the need for deeper analysis and improvements of techniques.