Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach

Bibliographic Details
Main Author: Hussain, Fiza Gulzar
Publication Date: 2024
Other Authors: Wasim, Muhammad, Cheema, Sehrish Munawar, Pires, Ivan Miguel
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10773/42167
Summary: Lexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%.
id RCAP_0fd0020d35dd6db00ba513ddd8a94e89
oai_identifier_str oai:ria.ua.pt:10773/42167
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approachMulti-label text classificationBiomedical question classificationFeature engineeringMachine learningNatural language processing (NLP)Lexical answer (LAT)Ensemble learningLexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%.2024-07-19T16:47:53Z2024-01-01T00:00:00Z2024info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/42167eng10.1007/s10115-024-02113-7Hussain, Fiza GulzarWasim, MuhammadCheema, Sehrish MunawarPires, Ivan Miguelinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-07-22T01:46:38Zoai:ria.ua.pt:10773/42167Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:38:16.876701Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
title Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
spellingShingle Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
Hussain, Fiza Gulzar
Multi-label text classification
Biomedical question classification
Feature engineering
Machine learning
Natural language processing (NLP)
Lexical answer (LAT)
Ensemble learning
title_short Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
title_full Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
title_fullStr Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
title_full_unstemmed Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
title_sort Semantic features analysis for biomedical lexical answer type prediction using ensemble learning approach
author Hussain, Fiza Gulzar
author_facet Hussain, Fiza Gulzar
Wasim, Muhammad
Cheema, Sehrish Munawar
Pires, Ivan Miguel
author_role author
author2 Wasim, Muhammad
Cheema, Sehrish Munawar
Pires, Ivan Miguel
author2_role author
author
author
dc.contributor.author.fl_str_mv Hussain, Fiza Gulzar
Wasim, Muhammad
Cheema, Sehrish Munawar
Pires, Ivan Miguel
dc.subject.por.fl_str_mv Multi-label text classification
Biomedical question classification
Feature engineering
Machine learning
Natural language processing (NLP)
Lexical answer (LAT)
Ensemble learning
topic Multi-label text classification
Biomedical question classification
Feature engineering
Machine learning
Natural language processing (NLP)
Lexical answer (LAT)
Ensemble learning
description Lexical answer type prediction is integral to biomedical question–answering systems. LAT prediction aims to predict the expected answer’s semantic type of a factoid or list-type biomedical question. It also aids in the answer processing stage of a QA system to assign a high score to the most relevant answers. Although considerable research efforts exist for LAT prediction in diverse domains, it remains a challenging biomedical problem. LAT prediction for the biomedical field is a multi-label classification problem, as one biomedical question might have more than one expected answer type. Achieving high performance on this task is challenging as biomedical questions have limited lexical features. One biomedical question must be assigned multiple labels given these limited lexical features. In this paper, we develop a novel feature set (lexical, noun concepts, verb concepts, protein–protein interactions, and biomedical entities) from these lexical features. Using ensemble learning with bagging, we use the label power set transformation technique to classify multi-label. We evaluate the integrity of our proposed methodology on the publicly available multi-label biomedical questions dataset (MLBioMedLAT) and compare it with twelve state-of-the-art multi-label classification algorithms. Our proposed method attains a micro-F1 score of 77%, outperforming the baseline model by 25.5%.
publishDate 2024
dc.date.none.fl_str_mv 2024-07-19T16:47:53Z
2024-01-01T00:00:00Z
2024
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/42167
url http://hdl.handle.net/10773/42167
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1007/s10115-024-02113-7
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597560970805248