Comparison of record linkage methods
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | GeSec |
Texto Completo: | https://ojs.revistagesec.org.br/secretariado/article/view/2171 |
Resumo: | Record linkage is an important tool to enhance database integration. This is even more valuable in a scenario with more hefty budget cuts and a growing drop in response rate in traditional surveys. This strategy makes it possible to expand the crossing alternatives with variables not present in the original base. However, there are many different data pairing methods exposed in the literature. In this sense, the objective of this paper is to compare well-known methods of record linkage. The comparison was made in synthetic dataset. To compare the methods, it was adopted a quantitative approach based on the Precision, Recall, and F-Statistics metrics, using two comparison functions: Levenshtein and Jaro-Winkler. Among the six types of classifiers analyzed, the supervised methods had the best results. |
id |
SINSESP_ee6910cecf5b5a87bfdff82b40e4c85e |
---|---|
oai_identifier_str |
oai:ojs2.revistagesec.org.br:article/2171 |
network_acronym_str |
SINSESP |
network_name_str |
GeSec |
repository_id_str |
|
spelling |
Comparison of record linkage methodsRecord LinkageData CleaningComparisonClassificationQualityRecord linkage is an important tool to enhance database integration. This is even more valuable in a scenario with more hefty budget cuts and a growing drop in response rate in traditional surveys. This strategy makes it possible to expand the crossing alternatives with variables not present in the original base. However, there are many different data pairing methods exposed in the literature. In this sense, the objective of this paper is to compare well-known methods of record linkage. The comparison was made in synthetic dataset. To compare the methods, it was adopted a quantitative approach based on the Precision, Recall, and F-Statistics metrics, using two comparison functions: Levenshtein and Jaro-Winkler. Among the six types of classifiers analyzed, the supervised methods had the best results.Revista de Gestão e Secretariado2023-05-18info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://ojs.revistagesec.org.br/secretariado/article/view/217110.7769/gesec.v14i5.2171Revista de Gestão e Secretariado (Management and Administrative Professional Review); Vol. 14 No. 5 (2023): Revista de Gestão e Secretariado v.14, n.5, 2023; 7999-8004Revista de Gestão e Secretariado; Vol. 14 Núm. 5 (2023): Revista de Gestão e Secretariado v.14, n.5, 2023; 7999-8004Revista de Gestão e Secretariado; v. 14 n. 5 (2023): Revista de Gestão e Secretariado v.14, n.5, 2023; 7999-80042178-9010reponame:GeSecinstname:Sindicato das Secretárias do Estado de São Paulo (SINSESP)instacron:SINSESPenghttps://ojs.revistagesec.org.br/secretariado/article/view/2171/1142Vieira, Marcus André Alves ZimmermannLouise e Silva, Karolineinfo:eu-repo/semantics/openAccess2023-05-19T10:31:03Zoai:ojs2.revistagesec.org.br:article/2171Revistahttps://www.revistagesec.org.br/ONGhttps://ojs.revistagesec.org.br/secretariado/oaieditor@revistagesec.org.br | gestoreditorial@revistagesec.org.br | rf.sabino@gmail.com2178-90102178-9010opendoar:2023-05-19T10:31:03GeSec - Sindicato das Secretárias do Estado de São Paulo (SINSESP)false |
dc.title.none.fl_str_mv |
Comparison of record linkage methods |
title |
Comparison of record linkage methods |
spellingShingle |
Comparison of record linkage methods Vieira, Marcus André Alves Zimmermann Record Linkage Data Cleaning Comparison Classification Quality |
title_short |
Comparison of record linkage methods |
title_full |
Comparison of record linkage methods |
title_fullStr |
Comparison of record linkage methods |
title_full_unstemmed |
Comparison of record linkage methods |
title_sort |
Comparison of record linkage methods |
author |
Vieira, Marcus André Alves Zimmermann |
author_facet |
Vieira, Marcus André Alves Zimmermann Louise e Silva, Karoline |
author_role |
author |
author2 |
Louise e Silva, Karoline |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Vieira, Marcus André Alves Zimmermann Louise e Silva, Karoline |
dc.subject.por.fl_str_mv |
Record Linkage Data Cleaning Comparison Classification Quality |
topic |
Record Linkage Data Cleaning Comparison Classification Quality |
description |
Record linkage is an important tool to enhance database integration. This is even more valuable in a scenario with more hefty budget cuts and a growing drop in response rate in traditional surveys. This strategy makes it possible to expand the crossing alternatives with variables not present in the original base. However, there are many different data pairing methods exposed in the literature. In this sense, the objective of this paper is to compare well-known methods of record linkage. The comparison was made in synthetic dataset. To compare the methods, it was adopted a quantitative approach based on the Precision, Recall, and F-Statistics metrics, using two comparison functions: Levenshtein and Jaro-Winkler. Among the six types of classifiers analyzed, the supervised methods had the best results. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-05-18 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://ojs.revistagesec.org.br/secretariado/article/view/2171 10.7769/gesec.v14i5.2171 |
url |
https://ojs.revistagesec.org.br/secretariado/article/view/2171 |
identifier_str_mv |
10.7769/gesec.v14i5.2171 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://ojs.revistagesec.org.br/secretariado/article/view/2171/1142 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Revista de Gestão e Secretariado |
publisher.none.fl_str_mv |
Revista de Gestão e Secretariado |
dc.source.none.fl_str_mv |
Revista de Gestão e Secretariado (Management and Administrative Professional Review); Vol. 14 No. 5 (2023): Revista de Gestão e Secretariado v.14, n.5, 2023; 7999-8004 Revista de Gestão e Secretariado; Vol. 14 Núm. 5 (2023): Revista de Gestão e Secretariado v.14, n.5, 2023; 7999-8004 Revista de Gestão e Secretariado; v. 14 n. 5 (2023): Revista de Gestão e Secretariado v.14, n.5, 2023; 7999-8004 2178-9010 reponame:GeSec instname:Sindicato das Secretárias do Estado de São Paulo (SINSESP) instacron:SINSESP |
instname_str |
Sindicato das Secretárias do Estado de São Paulo (SINSESP) |
instacron_str |
SINSESP |
institution |
SINSESP |
reponame_str |
GeSec |
collection |
GeSec |
repository.name.fl_str_mv |
GeSec - Sindicato das Secretárias do Estado de São Paulo (SINSESP) |
repository.mail.fl_str_mv |
editor@revistagesec.org.br | gestoreditorial@revistagesec.org.br | rf.sabino@gmail.com |
_version_ |
1838625559898226688 |