Automated CV screening
| Main Author: | |
|---|---|
| Publication Date: | 2018 |
| Format: | Master thesis |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10071/32236 |
Summary: | In this dissertation was predicted the outcome of job applications using machine learning. The focus of the outcome is on the first stage of the recruitment process, where there is a very large number of applications. The data used to train classifiers and predict targets was provided by a company’s human resources tool, where they receive structured online applications with their corresponding CVs. The applications are for different job positions in the same company. The first part the experiments is the preprocessing of the data. Here the reader can find a description of the variables and how they are processed with preprocessing techniques, to be used by the machine learning algorithms. In the second part you will find the results for the predictions made by the different algorithms. The ones that were used in this thesis are: Decision Tree, Random Forest, Gradient Boosted Tree, SVM and Artificial Neural Networks, all form the python’s sklearn package. The outcomes that are predicted in this work are if the candidate passed the first stage of the screening process, if the candidate failed the overall process, if the candidate was hired, and the grades given to the applications labeled by the hiring managers. Results showed that by using the variable "Job ID", that describes the job that each candidate is applying, improved the predictions significantly. Without using the ’Job ID’ the best accuracies achieved were around the 75%. Using the "Job ID", the best accuracies were around 90%. Overall the Random Forest and Gradient Boosted Tree had the best results. The attributes that contributed the most to predict the different targets were the specification and area of study, the highest education achieved, the number of languages spoken, and the distance from home to work. |
| id |
RCAP_61acb45288d17a46b026af57c658c8dd |
|---|---|
| oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/32236 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Automated CV screeningMachine learningCurriculum vitaeIn this dissertation was predicted the outcome of job applications using machine learning. The focus of the outcome is on the first stage of the recruitment process, where there is a very large number of applications. The data used to train classifiers and predict targets was provided by a company’s human resources tool, where they receive structured online applications with their corresponding CVs. The applications are for different job positions in the same company. The first part the experiments is the preprocessing of the data. Here the reader can find a description of the variables and how they are processed with preprocessing techniques, to be used by the machine learning algorithms. In the second part you will find the results for the predictions made by the different algorithms. The ones that were used in this thesis are: Decision Tree, Random Forest, Gradient Boosted Tree, SVM and Artificial Neural Networks, all form the python’s sklearn package. The outcomes that are predicted in this work are if the candidate passed the first stage of the screening process, if the candidate failed the overall process, if the candidate was hired, and the grades given to the applications labeled by the hiring managers. Results showed that by using the variable "Job ID", that describes the job that each candidate is applying, improved the predictions significantly. Without using the ’Job ID’ the best accuracies achieved were around the 75%. Using the "Job ID", the best accuracies were around 90%. Overall the Random Forest and Gradient Boosted Tree had the best results. The attributes that contributed the most to predict the different targets were the specification and area of study, the highest education achieved, the number of languages spoken, and the distance from home to work.Nesta dissertação é feita uma previsão do resultado de candidaturas de emprego usando algoritmos de "machine learning". O foco das previsões é o resultado das candidaturas na primeira fase do processo de aprovação, onde há um número elevado de candidaturas. Os dados usados para treinar os modelos são provenientes de uma ferramenta de recursos humanos, onde as candidaturas estão organizadas numa forma estruturada com os respectivos CVs associados. Estas candidaturas são para diferentes posições dentro da mesma empresa. A primeira parte das experiências corresponde ao pré-processamento dos dados. Aqui o leitor pode encontrar uma descrição das variáveis e a forma como são processadas para serem usadas pelos algoritmos de "machine learning". Os algoritmos usados nesta dissertação são os seguintes: árvore de decisão, "random forest", "gradient boosted tree", SVM e redes neuronais. As implementações dos algoritmos, são todas provenientes da biblioteca de "python sklearn". As previsões que são feitas neste trabalho, correspondem ao desfecho da primeira fase, ao desfecho final, se o candidato falha em qualquer uma das fases e também à nota da candidatura que foi dada pelos recrutadores da empresa. Os resultados das experiências mostram que a variável "Job ID", que corresponde a uma dada posição da empresa, melhoram significativamente as previsões feitas pelos algoritmos. Sem usar o "Job ID", a percentagem de precisão ronda os 75%, ao usar a variável a percentagem ronda os 90%. Os algoritmos que obtiveram os melhores resultado ao longo da dissertação, foram o "random forest" e a "gradient boosted tree". Os atributos que tiveram o maior impacto na previsões, foram a especificação e área de estudo, o maior grau de ensino obtido pelo candidato, o número de línguas faladas e as distância da casa ao trabalho.2024-08-28T14:47:45Z2018-11-21T00:00:00Z2018-11-212018-12info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/32236TID:203666690engHauptfleisch, Mário Rivottiinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-09-08T01:27:42Zoai:repositorio.iscte-iul.pt:10071/32236Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:48:16.219095Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Automated CV screening |
| title |
Automated CV screening |
| spellingShingle |
Automated CV screening Hauptfleisch, Mário Rivotti Machine learning Curriculum vitae |
| title_short |
Automated CV screening |
| title_full |
Automated CV screening |
| title_fullStr |
Automated CV screening |
| title_full_unstemmed |
Automated CV screening |
| title_sort |
Automated CV screening |
| author |
Hauptfleisch, Mário Rivotti |
| author_facet |
Hauptfleisch, Mário Rivotti |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Hauptfleisch, Mário Rivotti |
| dc.subject.por.fl_str_mv |
Machine learning Curriculum vitae |
| topic |
Machine learning Curriculum vitae |
| description |
In this dissertation was predicted the outcome of job applications using machine learning. The focus of the outcome is on the first stage of the recruitment process, where there is a very large number of applications. The data used to train classifiers and predict targets was provided by a company’s human resources tool, where they receive structured online applications with their corresponding CVs. The applications are for different job positions in the same company. The first part the experiments is the preprocessing of the data. Here the reader can find a description of the variables and how they are processed with preprocessing techniques, to be used by the machine learning algorithms. In the second part you will find the results for the predictions made by the different algorithms. The ones that were used in this thesis are: Decision Tree, Random Forest, Gradient Boosted Tree, SVM and Artificial Neural Networks, all form the python’s sklearn package. The outcomes that are predicted in this work are if the candidate passed the first stage of the screening process, if the candidate failed the overall process, if the candidate was hired, and the grades given to the applications labeled by the hiring managers. Results showed that by using the variable "Job ID", that describes the job that each candidate is applying, improved the predictions significantly. Without using the ’Job ID’ the best accuracies achieved were around the 75%. Using the "Job ID", the best accuracies were around 90%. Overall the Random Forest and Gradient Boosted Tree had the best results. The attributes that contributed the most to predict the different targets were the specification and area of study, the highest education achieved, the number of languages spoken, and the distance from home to work. |
| publishDate |
2018 |
| dc.date.none.fl_str_mv |
2018-11-21T00:00:00Z 2018-11-21 2018-12 2024-08-28T14:47:45Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/32236 TID:203666690 |
| url |
http://hdl.handle.net/10071/32236 |
| identifier_str_mv |
TID:203666690 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833597675725914112 |