Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
Main Author: | |
---|---|
Publication Date: | 2024 |
Other Authors: | , , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10773/42167 |
Summary: | Lexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%. |
id |
RCAP_0fd0020d35dd6db00ba513ddd8a94e89 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/42167 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approachMulti-label text classificationBiomedical question classificationFeature engineeringMachine learningNatural language processing (NLP)Lexical answer (LAT)Ensemble learningLexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%.2024-07-19T16:47:53Z2024-01-01T00:00:00Z2024info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/42167eng10.1007/s10115-024-02113-7Hussain, Fiza GulzarWasim, MuhammadCheema, Sehrish MunawarPires, Ivan Miguelinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-07-22T01:46:38Zoai:ria.ua.pt:10773/42167Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:38:16.876701Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
title |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
spellingShingle |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach Hussain, Fiza Gulzar Multi-label text classification Biomedical question classification Feature engineering Machine learning Natural language processing (NLP) Lexical answer (LAT) Ensemble learning |
title_short |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
title_full |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
title_fullStr |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
title_full_unstemmed |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
title_sort |
Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach |
author |
Hussain, Fiza Gulzar |
author_facet |
Hussain, Fiza Gulzar Wasim, Muhammad Cheema, Sehrish Munawar Pires, Ivan Miguel |
author_role |
author |
author2 |
Wasim, Muhammad Cheema, Sehrish Munawar Pires, Ivan Miguel |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Hussain, Fiza Gulzar Wasim, Muhammad Cheema, Sehrish Munawar Pires, Ivan Miguel |
dc.subject.por.fl_str_mv |
Multi-label text classification Biomedical question classification Feature engineering Machine learning Natural language processing (NLP) Lexical answer (LAT) Ensemble learning |
topic |
Multi-label text classification Biomedical question classification Feature engineering Machine learning Natural language processing (NLP) Lexical answer (LAT) Ensemble learning |
description |
Lexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-07-19T16:47:53Z 2024-01-01T00:00:00Z 2024 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/42167 |
url |
http://hdl.handle.net/10773/42167 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
10.1007/s10115-024-02113-7 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833597560970805248 |