Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Yoshida Junior, Valter Takuo |
Orientador(a): |
Schiozer, Rafael Felipe |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Link de acesso: |
https://hdl.handle.net/10438/33946
|
Resumo: |
Large databases and Machine Learning have increased our ability to produce credit scoring models with a different number of observations and explanatory variables. Although managers and regulators have concerns about the potential risks associated with algorithms’ discretion for variable selection, model building and the lack of causality, insufficient attention has been given to the inappropriate utilization of highhit rate credit scoring models, or to credit scoring model risk. This study fills this gap by proposing a novel model risk measure, , Credit Scoring Model Risk, based on the correlation between the dependent variable and the generated predictions. This work empirically tests the in plugin LASSO credit scoring models and finds that adding loans from different banks to increase the number of observations is not optimal in in-sample basis, challenging the generally accepted assumption that more data leads to better predictions. However, the evaluation of model performance using in-sample data may exhibit instability across out-of-time estimations. Therefore, the decision-making (choosing a model among a variety of possibilities) based exclusively on in-sample’s measures may be problematic, because banks’ loan portfolios change over time, models can be born uncalibrated (or not well-fitted to the current portfolio) and can behave differently under new macroeconomic conditions, or along exogenous and stochastic events. This work also proposes a procedure to forecast the best-performing model in out-ottime datasets. Three (complementary) approaches help the model user to choose between the segmented or full data models, for out-of-time applications, by predicting which model tends to have higher correlation (or lower model risk). The first approach is based on the concept of “shrinkage”; the second uses a Monte Carlo simulation; and the third is a Bayesian estimation of covariances. |