Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning

Detalhes bibliográficos
Autor(a) principal: Asif, Muhammad
Data de Publicação: 2020
Outros Autores: Martiniano, Hugo F. M. C., Marques, Ana Rita, Santos, João Xavier, Vilela, Joana, Rasga, Celia, Oliveira, Guiomar, Couto, Francisco M., Vicente, Astrid M.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: https://hdl.handle.net/10316/106751
https://doi.org/10.1038/s41398-020-0721-1
Resumo: The complex genetic architecture of Autism Spectrum Disorder (ASD) and its heterogeneous phenotype makes molecular diagnosis and patient prognosis challenging tasks. To establish more precise genotype-phenotype correlations in ASD, we developed a novel machine-learning integrative approach, which seeks to delineate associations between patients' clinical profiles and disrupted biological processes, inferred from their copy number variants (CNVs) that span brain genes. Clustering analysis of the relevant clinical measures from 2446 ASD cases in the Autism Genome Project identified two distinct phenotypic subgroups. Patients in these clusters differed significantly in ADOS-defined severity, adaptive behavior profiles, intellectual ability, and verbal status, the latter contributing the most for cluster stability and cohesion. Functional enrichment analysis of brain genes disrupted by CNVs in these ASD cases identified 15 statistically significant biological processes, including cell adhesion, neural development, cognition, and polyubiquitination, in line with previous ASD findings. A Naive Bayes classifier, generated to predict the ASD phenotypic clusters from disrupted biological processes, achieved predictions with a high precision (0.82) but low recall (0.39), for a subset of patients with higher biological Information Content scores. This study shows that milder and more severe clinical presentations can have distinct underlying biological mechanisms. It further highlights how machine-learning approaches can reduce clinical heterogeneity by using multidimensional clinical measures, and establishes genotype-phenotype correlations in ASD. However, predictions are strongly dependent on patient's information content. Findings are therefore a first step toward the translation of genetic information into clinically useful applications, and emphasize the need for larger datasets with very complete clinical and biological information.
id RCAP_f1c7c786277a83b6178b8cf9b4c4b3bc
oai_identifier_str oai:estudogeral.uc.pt:10316/106751
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learningBayes TheoremDNA Copy Number VariationsHumansMachine LearningPhenotypeAutism Spectrum DisorderThe complex genetic architecture of Autism Spectrum Disorder (ASD) and its heterogeneous phenotype makes molecular diagnosis and patient prognosis challenging tasks. To establish more precise genotype-phenotype correlations in ASD, we developed a novel machine-learning integrative approach, which seeks to delineate associations between patients' clinical profiles and disrupted biological processes, inferred from their copy number variants (CNVs) that span brain genes. Clustering analysis of the relevant clinical measures from 2446 ASD cases in the Autism Genome Project identified two distinct phenotypic subgroups. Patients in these clusters differed significantly in ADOS-defined severity, adaptive behavior profiles, intellectual ability, and verbal status, the latter contributing the most for cluster stability and cohesion. Functional enrichment analysis of brain genes disrupted by CNVs in these ASD cases identified 15 statistically significant biological processes, including cell adhesion, neural development, cognition, and polyubiquitination, in line with previous ASD findings. A Naive Bayes classifier, generated to predict the ASD phenotypic clusters from disrupted biological processes, achieved predictions with a high precision (0.82) but low recall (0.39), for a subset of patients with higher biological Information Content scores. This study shows that milder and more severe clinical presentations can have distinct underlying biological mechanisms. It further highlights how machine-learning approaches can reduce clinical heterogeneity by using multidimensional clinical measures, and establishes genotype-phenotype correlations in ASD. However, predictions are strongly dependent on patient's information content. Findings are therefore a first step toward the translation of genetic information into clinically useful applications, and emphasize the need for larger datasets with very complete clinical and biological information.Springer Nature2020-01-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/106751https://hdl.handle.net/10316/106751https://doi.org/10.1038/s41398-020-0721-1eng2158-3188Asif, MuhammadMartiniano, Hugo F. M. C.Marques, Ana RitaSantos, João XavierVilela, JoanaRasga, CeliaOliveira, GuiomarCouto, Francisco M.Vicente, Astrid M.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-09-13T15:45:59Zoai:estudogeral.uc.pt:10316/106751Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:57:29.136258Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
title Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
spellingShingle Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
Asif, Muhammad
Bayes Theorem
DNA Copy Number Variations
Humans
Machine Learning
Phenotype
Autism Spectrum Disorder
title_short Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
title_full Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
title_fullStr Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
title_full_unstemmed Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
title_sort Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning
author Asif, Muhammad
author_facet Asif, Muhammad
Martiniano, Hugo F. M. C.
Marques, Ana Rita
Santos, João Xavier
Vilela, Joana
Rasga, Celia
Oliveira, Guiomar
Couto, Francisco M.
Vicente, Astrid M.
author_role author
author2 Martiniano, Hugo F. M. C.
Marques, Ana Rita
Santos, João Xavier
Vilela, Joana
Rasga, Celia
Oliveira, Guiomar
Couto, Francisco M.
Vicente, Astrid M.
author2_role author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Asif, Muhammad
Martiniano, Hugo F. M. C.
Marques, Ana Rita
Santos, João Xavier
Vilela, Joana
Rasga, Celia
Oliveira, Guiomar
Couto, Francisco M.
Vicente, Astrid M.
dc.subject.por.fl_str_mv Bayes Theorem
DNA Copy Number Variations
Humans
Machine Learning
Phenotype
Autism Spectrum Disorder
topic Bayes Theorem
DNA Copy Number Variations
Humans
Machine Learning
Phenotype
Autism Spectrum Disorder
description The complex genetic architecture of Autism Spectrum Disorder (ASD) and its heterogeneous phenotype makes molecular diagnosis and patient prognosis challenging tasks. To establish more precise genotype-phenotype correlations in ASD, we developed a novel machine-learning integrative approach, which seeks to delineate associations between patients' clinical profiles and disrupted biological processes, inferred from their copy number variants (CNVs) that span brain genes. Clustering analysis of the relevant clinical measures from 2446 ASD cases in the Autism Genome Project identified two distinct phenotypic subgroups. Patients in these clusters differed significantly in ADOS-defined severity, adaptive behavior profiles, intellectual ability, and verbal status, the latter contributing the most for cluster stability and cohesion. Functional enrichment analysis of brain genes disrupted by CNVs in these ASD cases identified 15 statistically significant biological processes, including cell adhesion, neural development, cognition, and polyubiquitination, in line with previous ASD findings. A Naive Bayes classifier, generated to predict the ASD phenotypic clusters from disrupted biological processes, achieved predictions with a high precision (0.82) but low recall (0.39), for a subset of patients with higher biological Information Content scores. This study shows that milder and more severe clinical presentations can have distinct underlying biological mechanisms. It further highlights how machine-learning approaches can reduce clinical heterogeneity by using multidimensional clinical measures, and establishes genotype-phenotype correlations in ASD. However, predictions are strongly dependent on patient's information content. Findings are therefore a first step toward the translation of genetic information into clinically useful applications, and emphasize the need for larger datasets with very complete clinical and biological information.
publishDate 2020
dc.date.none.fl_str_mv 2020-01-28
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10316/106751
https://hdl.handle.net/10316/106751
https://doi.org/10.1038/s41398-020-0721-1
url https://hdl.handle.net/10316/106751
https://doi.org/10.1038/s41398-020-0721-1
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2158-3188
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Springer Nature
publisher.none.fl_str_mv Springer Nature
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602530213363712