ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data

Bibliographic Details
Main Author: Mixão, Verónica
Publication Date: 2023
Other Authors: Pinto, Miguel, Sobral, Daniel, Di Pasquale, Adriano, Gomes, João Paulo, Borges, Vítor
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.18/9084
Summary: Background: Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases' prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. Results: We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions: In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .
id RCAP_edf5cd315cdbbd5ec43292a8c9228c25
oai_identifier_str oai:repositorio.insa.pt:10400.18/9084
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological dataAutomated PipelineGenetic ClusteringGenomic SurveillancePublic HealthReporTreeGenomics MethodsMateriais e Métodos de ReferênciaBackground: Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases' prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. Results: We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions: In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .BMCRepositório Científico do Instituto Nacional de SaúdeMixão, VerónicaPinto, MiguelSobral, DanielDi Pasquale, AdrianoGomes, João PauloBorges, Vítor2024-02-12T13:10:49Z2023-06-152023-06-15T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.18/9084eng1756-994X10.1186/s13073-023-01196-1info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-26T14:15:58Zoai:repositorio.insa.pt:10400.18/9084Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T21:30:11.750522Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
spellingShingle ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
Mixão, Verónica
Automated Pipeline
Genetic Clustering
Genomic Surveillance
Public Health
ReporTree
Genomics Methods
Materiais e Métodos de Referência
title_short ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_full ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_fullStr ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_full_unstemmed ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
title_sort ReporTree: a surveillance-oriented tool to strengthen the linkage between pathogen genetic clusters and epidemiological data
author Mixão, Verónica
author_facet Mixão, Verónica
Pinto, Miguel
Sobral, Daniel
Di Pasquale, Adriano
Gomes, João Paulo
Borges, Vítor
author_role author
author2 Pinto, Miguel
Sobral, Daniel
Di Pasquale, Adriano
Gomes, João Paulo
Borges, Vítor
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Repositório Científico do Instituto Nacional de Saúde
dc.contributor.author.fl_str_mv Mixão, Verónica
Pinto, Miguel
Sobral, Daniel
Di Pasquale, Adriano
Gomes, João Paulo
Borges, Vítor
dc.subject.por.fl_str_mv Automated Pipeline
Genetic Clustering
Genomic Surveillance
Public Health
ReporTree
Genomics Methods
Materiais e Métodos de Referência
topic Automated Pipeline
Genetic Clustering
Genomic Surveillance
Public Health
ReporTree
Genomics Methods
Materiais e Métodos de Referência
description Background: Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases' prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. Results: We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. Conclusions: In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .
publishDate 2023
dc.date.none.fl_str_mv 2023-06-15
2023-06-15T00:00:00Z
2024-02-12T13:10:49Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.18/9084
url http://hdl.handle.net/10400.18/9084
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1756-994X
10.1186/s13073-023-01196-1
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv BMC
publisher.none.fl_str_mv BMC
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833599304088944640