Graph databases for HR relationships

Rafael Araújo Moura

Graph databases for HR relationships

Bibliographic Details
Main Author:	Rafael Araújo Moura
Publication Date:	2021
Format:	Master thesis
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	https://hdl.handle.net/10216/137426
Summary:	Human Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.

Item metadata

id	RCAP_48d5db0687b96b22dcfe7b585ff81fef
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/137426
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Graph databases for HR relationshipsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringHuman Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.2021-10-112021-10-11T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/137426TID:202827780engRafael Araújo Mourainfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-27T17:44:22Zoai:repositorio-aberto.up.pt:10216/137426Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T22:25:07.900083Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Graph databases for HR relationships
title	Graph databases for HR relationships
spellingShingle	Graph databases for HR relationships Rafael Araújo Moura Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
title_short	Graph databases for HR relationships
title_full	Graph databases for HR relationships
title_fullStr	Graph databases for HR relationships
title_full_unstemmed	Graph databases for HR relationships
title_sort	Graph databases for HR relationships
author	Rafael Araújo Moura
author_facet	Rafael Araújo Moura
author_role	author
dc.contributor.author.fl_str_mv	Rafael Araújo Moura
dc.subject.por.fl_str_mv	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
topic	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
description	Human Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.
publishDate	2021
dc.date.none.fl_str_mv	2021-10-11 2021-10-11T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/137426 TID:202827780
url	https://hdl.handle.net/10216/137426
identifier_str_mv	TID:202827780
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833599682899607552

Graph databases for HR relationships

Similar Items