Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

Bibliographic Details
Main Author: Perezhohin, Yuriy
Publication Date: 2024
Other Authors: Peres, Fernando, Castelli, Mauro
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/174658
Summary: Perezhohin, Y., Peres, F., & Castelli, M. (2024). Combining computational linguistics with sentence embedding to create a zero-shot NLIDB. Array, 24, 1-11. Article 100368. https://doi.org/10.1016/j.array.2024.100368 --- This work was supported by MyNorth AI Research. This work was partially supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) by the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.
id RCAP_8a6c84292de933f8234e1fb8e81442d6
oai_identifier_str oai:run.unl.pt:10362/174658
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Combining computational linguistics with sentence embedding to create a zero-shot NLIDBText to SQLNatural language processingComputational linguisticsSentence embeddingsComputer Science(all)SDG 9 - Industry, Innovation, and InfrastructurePerezhohin, Y., Peres, F., & Castelli, M. (2024). Combining computational linguistics with sentence embedding to create a zero-shot NLIDB. Array, 24, 1-11. Article 100368. https://doi.org/10.1016/j.array.2024.100368 --- This work was supported by MyNorth AI Research. This work was partially supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) by the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNPerezhohin, YuriyPeres, FernandoCastelli, Mauro2024-11-05T23:20:34Z2024-122024-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article11application/pdfhttp://hdl.handle.net/10362/174658eng2590-0056PURE: 101919824https://doi.org/10.1016/j.array.2024.100368info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-12-02T01:35:36Zoai:run.unl.pt:10362/174658Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T19:12:54.370116Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
title Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
spellingShingle Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
Perezhohin, Yuriy
Text to SQL
Natural language processing
Computational linguistics
Sentence embeddings
Computer Science(all)
SDG 9 - Industry, Innovation, and Infrastructure
title_short Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
title_full Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
title_fullStr Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
title_full_unstemmed Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
title_sort Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
author Perezhohin, Yuriy
author_facet Perezhohin, Yuriy
Peres, Fernando
Castelli, Mauro
author_role author
author2 Peres, Fernando
Castelli, Mauro
author2_role author
author
dc.contributor.none.fl_str_mv NOVA Information Management School (NOVA IMS)
Information Management Research Center (MagIC) - NOVA Information Management School
RUN
dc.contributor.author.fl_str_mv Perezhohin, Yuriy
Peres, Fernando
Castelli, Mauro
dc.subject.por.fl_str_mv Text to SQL
Natural language processing
Computational linguistics
Sentence embeddings
Computer Science(all)
SDG 9 - Industry, Innovation, and Infrastructure
topic Text to SQL
Natural language processing
Computational linguistics
Sentence embeddings
Computer Science(all)
SDG 9 - Industry, Innovation, and Infrastructure
description Perezhohin, Y., Peres, F., & Castelli, M. (2024). Combining computational linguistics with sentence embedding to create a zero-shot NLIDB. Array, 24, 1-11. Article 100368. https://doi.org/10.1016/j.array.2024.100368 --- This work was supported by MyNorth AI Research. This work was partially supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) by the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.
publishDate 2024
dc.date.none.fl_str_mv 2024-11-05T23:20:34Z
2024-12
2024-12-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/174658
url http://hdl.handle.net/10362/174658
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2590-0056
PURE: 101919824
https://doi.org/10.1016/j.array.2024.100368
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 11
application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597947278786560