Exploring pseudo-labeling for reject inference

Martins, Margarida

Exploring pseudo-labeling for reject inference

Bibliographic Details
Main Author:	Martins, Margarida
Publication Date:	2024
Format:	Master thesis
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10400.14/44863
Summary:	Banks use algorithms to estimate the credit risk of loan applicants. However, we need to retrain these models. When retraining, we only know the label, meaning whether the applicant defaulted or not, for those accepted for the loan. Retraining only with the accepted will result in biased models and losses for the bank due to selection bias. To counteract this issue, we can infer the labels of those rejected. This is known as reject inference. In this thesis, we will pursue pseudo-labeling to do reject inference, which needs two models, the first to create the pseudo-labels for the rejected and the second to make the final predictions. We will create the pseudo-labels by training a lightGBM on the available data. Afterward, we will apply a logistic regression as the final model. We will compare the results against a baseline, setting all rejected to a category (default /not default). In addition, we will compare to a scenario where the rejection results from random decision-making, experiment five rejection rates, and see the effect of setting to default vs. not default. We found that doing lightGBM to infer the labels had a lower F1 score, AUC, and profit for the bank. As such, the bank should set all rejected to a category. Additionally, we found that setting all to default has a higher recall in the rejected population and higher profit. Moreover, a lower rejection rate increases profits.

Item metadata

id	RCAP_ef31ad5d25b2d7bff481fbd08d9770d3
oai_identifier_str	oai:repositorio.ucp.pt:10400.14/44863
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Exploring pseudo-labeling for reject inferenceMachine learningPseudo-labelingReject inferenceSelection biasBanks use algorithms to estimate the credit risk of loan applicants. However, we need to retrain these models. When retraining, we only know the label, meaning whether the applicant defaulted or not, for those accepted for the loan. Retraining only with the accepted will result in biased models and losses for the bank due to selection bias. To counteract this issue, we can infer the labels of those rejected. This is known as reject inference. In this thesis, we will pursue pseudo-labeling to do reject inference, which needs two models, the first to create the pseudo-labels for the rejected and the second to make the final predictions. We will create the pseudo-labels by training a lightGBM on the available data. Afterward, we will apply a logistic regression as the final model. We will compare the results against a baseline, setting all rejected to a category (default /not default). In addition, we will compare to a scenario where the rejection results from random decision-making, experiment five rejection rates, and see the effect of setting to default vs. not default. We found that doing lightGBM to infer the labels had a lower F1 score, AUC, and profit for the bank. As such, the bank should set all rejected to a category. Additionally, we found that setting all to default has a higher recall in the rejected population and higher profit. Moreover, a lower rejection rate increases profits.Brandão, SusanaVeritatiMartins, Margarida2024-04-30T15:47:49Z2024-01-252024-012024-01-25T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.14/44863urn:tid:203590783enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-13T15:46:00Zoai:repositorio.ucp.pt:10400.14/44863Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T02:15:18.714060Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Exploring pseudo-labeling for reject inference
title	Exploring pseudo-labeling for reject inference
spellingShingle	Exploring pseudo-labeling for reject inference Martins, Margarida Machine learning Pseudo-labeling Reject inference Selection bias
title_short	Exploring pseudo-labeling for reject inference
title_full	Exploring pseudo-labeling for reject inference
title_fullStr	Exploring pseudo-labeling for reject inference
title_full_unstemmed	Exploring pseudo-labeling for reject inference
title_sort	Exploring pseudo-labeling for reject inference
author	Martins, Margarida
author_facet	Martins, Margarida
author_role	author
dc.contributor.none.fl_str_mv	Brandão, Susana Veritati
dc.contributor.author.fl_str_mv	Martins, Margarida
dc.subject.por.fl_str_mv	Machine learning Pseudo-labeling Reject inference Selection bias
topic	Machine learning Pseudo-labeling Reject inference Selection bias
description	Banks use algorithms to estimate the credit risk of loan applicants. However, we need to retrain these models. When retraining, we only know the label, meaning whether the applicant defaulted or not, for those accepted for the loan. Retraining only with the accepted will result in biased models and losses for the bank due to selection bias. To counteract this issue, we can infer the labels of those rejected. This is known as reject inference. In this thesis, we will pursue pseudo-labeling to do reject inference, which needs two models, the first to create the pseudo-labels for the rejected and the second to make the final predictions. We will create the pseudo-labels by training a lightGBM on the available data. Afterward, we will apply a logistic regression as the final model. We will compare the results against a baseline, setting all rejected to a category (default /not default). In addition, we will compare to a scenario where the rejection results from random decision-making, experiment five rejection rates, and see the effect of setting to default vs. not default. We found that doing lightGBM to infer the labels had a lower F1 score, AUC, and profit for the bank. As such, the bank should set all rejected to a category. Additionally, we found that setting all to default has a higher recall in the rejected population and higher profit. Moreover, a lower rejection rate increases profits.
publishDate	2024
dc.date.none.fl_str_mv	2024-04-30T15:47:49Z 2024-01-25 2024-01 2024-01-25T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.14/44863 urn:tid:203590783
url	http://hdl.handle.net/10400.14/44863
identifier_str_mv	urn:tid:203590783
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833601285771755520

Exploring pseudo-labeling for reject inference

Similar Items