Painting the black box white: a fundamentalist-based trading strategy using interpretable trees

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Possatto, André Bina
Orientador(a): Fernandes, Marcelo
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Link de acesso: https://hdl.handle.net/10438/29461
Resumo: Difficulty understanding how a black box model makes predictions has undermined machine learning’s success in financial markets, according to a recent article from Bloomberg (2019b). Our work shows how model-agnostic methods to interpret machine learning predictions turn these models more transparent to a human investor. We benchmark three tree-based algorithms between themselves, creating long-short investment strategies with independent models for each leg and using only fundamentalist analysis. We then apply the models to the Brazilian stock market (Bovespa) and achieve an out-of-sample expected annual return of 26.4% with a Sharpe ratio of 0.50. Ensembles between the long and short legs improve this result for a Sharpe ratio of up to 1.26, comparable to other works in the literature reported by Avramov et al. (2019) when considering real-world constraints. Our strategy has low asset turnover and transaction costs do not explain the results. All models achieve positive risk premiums and two are statistically significant. Interpretation shows differences on the key predictors for over- and underperformance, with the first focusing on price-to-value and the second on size and liquidity. Local interpretation is discussed in the case of Magazine Luiza, showing how model explanation helps an investor to understand and decide which stocks to buy or sell based on the models’ output. We argue that different performance and interpretation between long and short models and the possibility of ensembling are key advantages of modeling these positions separately.