Automated CV screening

Bibliographic Details
Main Author: Hauptfleisch, Mário Rivotti
Publication Date: 2018
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10071/32236
Summary: In this dissertation was predicted the outcome of job applications using machine learning. The focus of the outcome is on the first stage of the recruitment process, where there is a very large number of applications. The data used to train classifiers and predict targets was provided by a company’s human resources tool, where they receive structured online applications with their corresponding CVs. The applications are for different job positions in the same company. The first part the experiments is the preprocessing of the data. Here the reader can find a description of the variables and how they are processed with preprocessing techniques, to be used by the machine learning algorithms. In the second part you will find the results for the predictions made by the different algorithms. The ones that were used in this thesis are: Decision Tree, Random Forest, Gradient Boosted Tree, SVM and Artificial Neural Networks, all form the python’s sklearn package. The outcomes that are predicted in this work are if the candidate passed the first stage of the screening process, if the candidate failed the overall process, if the candidate was hired, and the grades given to the applications labeled by the hiring managers. Results showed that by using the variable "Job ID", that describes the job that each candidate is applying, improved the predictions significantly. Without using the ’Job ID’ the best accuracies achieved were around the 75%. Using the "Job ID", the best accuracies were around 90%. Overall the Random Forest and Gradient Boosted Tree had the best results. The attributes that contributed the most to predict the different targets were the specification and area of study, the highest education achieved, the number of languages spoken, and the distance from home to work.
id RCAP_61acb45288d17a46b026af57c658c8dd
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/32236
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Automated CV screeningMachine learningCurriculum vitaeIn this dissertation was predicted the outcome of job applications using machine learning. The focus of the outcome is on the first stage of the recruitment process, where there is a very large number of applications. The data used to train classifiers and predict targets was provided by a company’s human resources tool, where they receive structured online applications with their corresponding CVs. The applications are for different job positions in the same company. The first part the experiments is the preprocessing of the data. Here the reader can find a description of the variables and how they are processed with preprocessing techniques, to be used by the machine learning algorithms. In the second part you will find the results for the predictions made by the different algorithms. The ones that were used in this thesis are: Decision Tree, Random Forest, Gradient Boosted Tree, SVM and Artificial Neural Networks, all form the python’s sklearn package. The outcomes that are predicted in this work are if the candidate passed the first stage of the screening process, if the candidate failed the overall process, if the candidate was hired, and the grades given to the applications labeled by the hiring managers. Results showed that by using the variable "Job ID", that describes the job that each candidate is applying, improved the predictions significantly. Without using the ’Job ID’ the best accuracies achieved were around the 75%. Using the "Job ID", the best accuracies were around 90%. Overall the Random Forest and Gradient Boosted Tree had the best results. The attributes that contributed the most to predict the different targets were the specification and area of study, the highest education achieved, the number of languages spoken, and the distance from home to work.Nesta dissertação é feita uma previsão do resultado de candidaturas de emprego usando algoritmos de "machine learning". O foco das previsões é o resultado das candidaturas na primeira fase do processo de aprovação, onde há um número elevado de candidaturas. Os dados usados para treinar os modelos são provenientes de uma ferramenta de recursos humanos, onde as candidaturas estão organizadas numa forma estruturada com os respectivos CVs associados. Estas candidaturas são para diferentes posições dentro da mesma empresa. A primeira parte das experiências corresponde ao pré-processamento dos dados. Aqui o leitor pode encontrar uma descrição das variáveis e a forma como são processadas para serem usadas pelos algoritmos de "machine learning". Os algoritmos usados nesta dissertação são os seguintes: árvore de decisão, "random forest", "gradient boosted tree", SVM e redes neuronais. As implementações dos algoritmos, são todas provenientes da biblioteca de "python sklearn". As previsões que são feitas neste trabalho, correspondem ao desfecho da primeira fase, ao desfecho final, se o candidato falha em qualquer uma das fases e também à nota da candidatura que foi dada pelos recrutadores da empresa. Os resultados das experiências mostram que a variável "Job ID", que corresponde a uma dada posição da empresa, melhoram significativamente as previsões feitas pelos algoritmos. Sem usar o "Job ID", a percentagem de precisão ronda os 75%, ao usar a variável a percentagem ronda os 90%. Os algoritmos que obtiveram os melhores resultado ao longo da dissertação, foram o "random forest" e a "gradient boosted tree". Os atributos que tiveram o maior impacto na previsões, foram a especificação e área de estudo, o maior grau de ensino obtido pelo candidato, o número de línguas faladas e as distância da casa ao trabalho.2024-08-28T14:47:45Z2018-11-21T00:00:00Z2018-11-212018-12info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/32236TID:203666690engHauptfleisch, Mário Rivottiinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-09-08T01:27:42Zoai:repositorio.iscte-iul.pt:10071/32236Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:48:16.219095Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Automated CV screening
title Automated CV screening
spellingShingle Automated CV screening
Hauptfleisch, Mário Rivotti
Machine learning
Curriculum vitae
title_short Automated CV screening
title_full Automated CV screening
title_fullStr Automated CV screening
title_full_unstemmed Automated CV screening
title_sort Automated CV screening
author Hauptfleisch, Mário Rivotti
author_facet Hauptfleisch, Mário Rivotti
author_role author
dc.contributor.author.fl_str_mv Hauptfleisch, Mário Rivotti
dc.subject.por.fl_str_mv Machine learning
Curriculum vitae
topic Machine learning
Curriculum vitae
description In this dissertation was predicted the outcome of job applications using machine learning. The focus of the outcome is on the first stage of the recruitment process, where there is a very large number of applications. The data used to train classifiers and predict targets was provided by a company’s human resources tool, where they receive structured online applications with their corresponding CVs. The applications are for different job positions in the same company. The first part the experiments is the preprocessing of the data. Here the reader can find a description of the variables and how they are processed with preprocessing techniques, to be used by the machine learning algorithms. In the second part you will find the results for the predictions made by the different algorithms. The ones that were used in this thesis are: Decision Tree, Random Forest, Gradient Boosted Tree, SVM and Artificial Neural Networks, all form the python’s sklearn package. The outcomes that are predicted in this work are if the candidate passed the first stage of the screening process, if the candidate failed the overall process, if the candidate was hired, and the grades given to the applications labeled by the hiring managers. Results showed that by using the variable "Job ID", that describes the job that each candidate is applying, improved the predictions significantly. Without using the ’Job ID’ the best accuracies achieved were around the 75%. Using the "Job ID", the best accuracies were around 90%. Overall the Random Forest and Gradient Boosted Tree had the best results. The attributes that contributed the most to predict the different targets were the specification and area of study, the highest education achieved, the number of languages spoken, and the distance from home to work.
publishDate 2018
dc.date.none.fl_str_mv 2018-11-21T00:00:00Z
2018-11-21
2018-12
2024-08-28T14:47:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/32236
TID:203666690
url http://hdl.handle.net/10071/32236
identifier_str_mv TID:203666690
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597675725914112