Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
| Main Author: | |
|---|---|
| Publication Date: | 2024 |
| Other Authors: | , , , , |
| Format: | Other |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10400.19/9301 |
Summary: | The increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments. |
| id |
RCAP_a33ddd93b3bf1b0783965a5699e10cf7 |
|---|---|
| oai_identifier_str |
oai:repositorio.ipv.pt:10400.19/9301 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Studymachine learning integrationdatabase optimizationquery performancedynamic workload managementPostgreSQLreal-time system tuningThe increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments.MDPIInstituto Politécnico de ViseuAbbasi, MaryamBernardo, Marco V.Vaz, PauloSilva, JoséMartins, PedroANTUNES VAZ, PAULO JOAQUIMSilva, José2025-03-25T10:57:11Z2024-09-182024-09-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/otherapplication/pdfhttp://hdl.handle.net/10400.19/9301enghttps://doi.org/10.3390/info15090574info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-29T02:31:21Zoai:repositorio.ipv.pt:10400.19/9301Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T04:41:06.579412Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| title |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| spellingShingle |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study Abbasi, Maryam machine learning integration database optimization query performance dynamic workload management PostgreSQL real-time system tuning |
| title_short |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| title_full |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| title_fullStr |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| title_full_unstemmed |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| title_sort |
Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study |
| author |
Abbasi, Maryam |
| author_facet |
Abbasi, Maryam Bernardo, Marco V. Vaz, Paulo Silva, José Martins, Pedro ANTUNES VAZ, PAULO JOAQUIM |
| author_role |
author |
| author2 |
Bernardo, Marco V. Vaz, Paulo Silva, José Martins, Pedro ANTUNES VAZ, PAULO JOAQUIM |
| author2_role |
author author author author author |
| dc.contributor.none.fl_str_mv |
Instituto Politécnico de Viseu |
| dc.contributor.author.fl_str_mv |
Abbasi, Maryam Bernardo, Marco V. Vaz, Paulo Silva, José Martins, Pedro ANTUNES VAZ, PAULO JOAQUIM Silva, José |
| dc.subject.por.fl_str_mv |
machine learning integration database optimization query performance dynamic workload management PostgreSQL real-time system tuning |
| topic |
machine learning integration database optimization query performance dynamic workload management PostgreSQL real-time system tuning |
| description |
The increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-09-18 2024-09-18T00:00:00Z 2025-03-25T10:57:11Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/other |
| format |
other |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.19/9301 |
| url |
http://hdl.handle.net/10400.19/9301 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
https://doi.org/10.3390/info15090574 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
MDPI |
| publisher.none.fl_str_mv |
MDPI |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602119174717440 |