Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study

Bibliographic Details
Main Author: Abbasi, Maryam
Publication Date: 2024
Other Authors: Bernardo, Marco V., Vaz, Paulo, Silva, José, Martins, Pedro, ANTUNES VAZ, PAULO JOAQUIM
Format: Other
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.19/9301
Summary: The increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments.
id RCAP_a33ddd93b3bf1b0783965a5699e10cf7
oai_identifier_str oai:repositorio.ipv.pt:10400.19/9301
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Studymachine learning integrationdatabase optimizationquery performancedynamic workload managementPostgreSQLreal-time system tuningThe increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments.MDPIInstituto Politécnico de ViseuAbbasi, MaryamBernardo, Marco V.Vaz, PauloSilva, JoséMartins, PedroANTUNES VAZ, PAULO JOAQUIMSilva, José2025-03-25T10:57:11Z2024-09-182024-09-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/otherapplication/pdfhttp://hdl.handle.net/10400.19/9301enghttps://doi.org/10.3390/info15090574info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-29T02:31:21Zoai:repositorio.ipv.pt:10400.19/9301Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T04:41:06.579412Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
title Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
spellingShingle Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
Abbasi, Maryam
machine learning integration
database optimization
query performance
dynamic workload management
PostgreSQL
real-time system tuning
title_short Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
title_full Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
title_fullStr Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
title_full_unstemmed Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
title_sort Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
author Abbasi, Maryam
author_facet Abbasi, Maryam
Bernardo, Marco V.
Vaz, Paulo
Silva, José
Martins, Pedro
ANTUNES VAZ, PAULO JOAQUIM
author_role author
author2 Bernardo, Marco V.
Vaz, Paulo
Silva, José
Martins, Pedro
ANTUNES VAZ, PAULO JOAQUIM
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Instituto Politécnico de Viseu
dc.contributor.author.fl_str_mv Abbasi, Maryam
Bernardo, Marco V.
Vaz, Paulo
Silva, José
Martins, Pedro
ANTUNES VAZ, PAULO JOAQUIM
Silva, José
dc.subject.por.fl_str_mv machine learning integration
database optimization
query performance
dynamic workload management
PostgreSQL
real-time system tuning
topic machine learning integration
database optimization
query performance
dynamic workload management
PostgreSQL
real-time system tuning
description The increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments.
publishDate 2024
dc.date.none.fl_str_mv 2024-09-18
2024-09-18T00:00:00Z
2025-03-25T10:57:11Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/other
format other
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.19/9301
url http://hdl.handle.net/10400.19/9301
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://doi.org/10.3390/info15090574
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602119174717440