Improving analysis of meta-omics data with the MOSCA framework

Bibliographic Details
Main Author: Sequeira, João Carlos Sequeira
Publication Date: 2022
Other Authors: Rocha, Miguel, Alves, M. M., Salvador, Andreia Filipa Ferreira
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/79258
Summary: Introduction: Meta-omics is an emergent field of research with many resources available in the form of databases and software. The information stored in databases is not always easily accessible, and software tools for meta-omics are often difficult to utilize. In this work, we present Meta-Omics Software for Community Analysis (MOSCA), a software framework that implements pipelines for the integrated analysis of metagenomics (MG), metatranscriptomics (MT) and metaproteomics (MP) data. This framework integrates tools allowing access to databases, handling of data and a complete workflow for meta-omics data analysis. Methodology and results: MOSCA was developed in Python 3, takes as input raw files obtained from Next-generation sequencing (in FastQ format), and from mass spectrometry (mass spectra in vendor or peak-picked formats), and integrates several tools for MG, MT and MP analysis. These tools are connected through their inputs/outputs by snakemake, in a fully automated workflow. MG analysis starts with preprocessing of sequencing reads, which automatically configures Trimmomatic to remove adapters and low-quality reads based on FastQC quality reports, and SortMeRNA for rRNA reads removal. Assembly can be performed with MetaSPAdes or Megahit and is followed by binning with MaxBin2 and CheckM for quality check. Genes are identified with FragGeneScan and are annotated with both UPIMAPI (homology-based annotation) and reCOGnizer (domain-based annotation), with reference to UniProt KB and eight databases included in the Conserved Domains Database, respectively. Bowtie2 is used to align reads to metagenomes. Protein identification and quantification can be performed with either SearchCLI coupled to PeptideShaker (performing peptide-to-spectrum matching and spectra count) or using MaxQuant (with quantification at the MS1 level). Differential gene expression analysis is performed with DESeq2, and heatmaps, volcano plots and PCA plots are generated. The expressed enzymes are plotted into hundreds of KEGG metabolic maps with the tool KEGGCharter, showing the metabolic functions that are differentially expressed and the taxonomic assignment. Tables, heatmaps and other representations obtained with MOSCA provide an interactive, accessible and comprehensive representation of the information obtained from MG, MT and MP analyses. Conclusions: MOSCA performs automatic analyses of MG, MT and MP datasets, integrating over 20 tools to obtain a comprehensive and easy to understand representation of microbial activity in different processes and conditions.
id RCAP_66b173e878a8c9950cdebe82103daf7b
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/79258
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Improving analysis of meta-omics data with the MOSCA frameworkmetagenomicsmetatranscriptomicsmetaproteomicsfunctional annotationIntroduction: Meta-omics is an emergent field of research with many resources available in the form of databases and software. The information stored in databases is not always easily accessible, and software tools for meta-omics are often difficult to utilize. In this work, we present Meta-Omics Software for Community Analysis (MOSCA), a software framework that implements pipelines for the integrated analysis of metagenomics (MG), metatranscriptomics (MT) and metaproteomics (MP) data. This framework integrates tools allowing access to databases, handling of data and a complete workflow for meta-omics data analysis. Methodology and results: MOSCA was developed in Python 3, takes as input raw files obtained from Next-generation sequencing (in FastQ format), and from mass spectrometry (mass spectra in vendor or peak-picked formats), and integrates several tools for MG, MT and MP analysis. These tools are connected through their inputs/outputs by snakemake, in a fully automated workflow. MG analysis starts with preprocessing of sequencing reads, which automatically configures Trimmomatic to remove adapters and low-quality reads based on FastQC quality reports, and SortMeRNA for rRNA reads removal. Assembly can be performed with MetaSPAdes or Megahit and is followed by binning with MaxBin2 and CheckM for quality check. Genes are identified with FragGeneScan and are annotated with both UPIMAPI (homology-based annotation) and reCOGnizer (domain-based annotation), with reference to UniProt KB and eight databases included in the Conserved Domains Database, respectively. Bowtie2 is used to align reads to metagenomes. Protein identification and quantification can be performed with either SearchCLI coupled to PeptideShaker (performing peptide-to-spectrum matching and spectra count) or using MaxQuant (with quantification at the MS1 level). Differential gene expression analysis is performed with DESeq2, and heatmaps, volcano plots and PCA plots are generated. The expressed enzymes are plotted into hundreds of KEGG metabolic maps with the tool KEGGCharter, showing the metabolic functions that are differentially expressed and the taxonomic assignment. Tables, heatmaps and other representations obtained with MOSCA provide an interactive, accessible and comprehensive representation of the information obtained from MG, MT and MP analyses. Conclusions: MOSCA performs automatic analyses of MG, MT and MP datasets, integrating over 20 tools to obtain a comprehensive and easy to understand representation of microbial activity in different processes and conditions.info:eu-repo/semantics/publishedVersionUniversidade do MinhoSequeira, João Carlos SequeiraRocha, MiguelAlves, M. M.Salvador, Andreia Filipa Ferreira2022-05-092022-05-09T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/79258engSequeira, J. C.; Rocha, Miguel; Alves, M. Madalena; Salvador, Andreia F., Improving analysis of meta-omics data with the MOSCA framework. ICBM 2022 - 4th International Conference on Biogas Microbiology. No. OC-OM-06, Braga, Portugal, May 9-11, 44, 2022.https://www.ceb.uminho.pt/Events/Details/4296info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T04:35:40Zoai:repositorium.sdum.uminho.pt:1822/79258Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T14:52:36.635340Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Improving analysis of meta-omics data with the MOSCA framework
title Improving analysis of meta-omics data with the MOSCA framework
spellingShingle Improving analysis of meta-omics data with the MOSCA framework
Sequeira, João Carlos Sequeira
metagenomics
metatranscriptomics
metaproteomics
functional annotation
title_short Improving analysis of meta-omics data with the MOSCA framework
title_full Improving analysis of meta-omics data with the MOSCA framework
title_fullStr Improving analysis of meta-omics data with the MOSCA framework
title_full_unstemmed Improving analysis of meta-omics data with the MOSCA framework
title_sort Improving analysis of meta-omics data with the MOSCA framework
author Sequeira, João Carlos Sequeira
author_facet Sequeira, João Carlos Sequeira
Rocha, Miguel
Alves, M. M.
Salvador, Andreia Filipa Ferreira
author_role author
author2 Rocha, Miguel
Alves, M. M.
Salvador, Andreia Filipa Ferreira
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Sequeira, João Carlos Sequeira
Rocha, Miguel
Alves, M. M.
Salvador, Andreia Filipa Ferreira
dc.subject.por.fl_str_mv metagenomics
metatranscriptomics
metaproteomics
functional annotation
topic metagenomics
metatranscriptomics
metaproteomics
functional annotation
description Introduction: Meta-omics is an emergent field of research with many resources available in the form of databases and software. The information stored in databases is not always easily accessible, and software tools for meta-omics are often difficult to utilize. In this work, we present Meta-Omics Software for Community Analysis (MOSCA), a software framework that implements pipelines for the integrated analysis of metagenomics (MG), metatranscriptomics (MT) and metaproteomics (MP) data. This framework integrates tools allowing access to databases, handling of data and a complete workflow for meta-omics data analysis. Methodology and results: MOSCA was developed in Python 3, takes as input raw files obtained from Next-generation sequencing (in FastQ format), and from mass spectrometry (mass spectra in vendor or peak-picked formats), and integrates several tools for MG, MT and MP analysis. These tools are connected through their inputs/outputs by snakemake, in a fully automated workflow. MG analysis starts with preprocessing of sequencing reads, which automatically configures Trimmomatic to remove adapters and low-quality reads based on FastQC quality reports, and SortMeRNA for rRNA reads removal. Assembly can be performed with MetaSPAdes or Megahit and is followed by binning with MaxBin2 and CheckM for quality check. Genes are identified with FragGeneScan and are annotated with both UPIMAPI (homology-based annotation) and reCOGnizer (domain-based annotation), with reference to UniProt KB and eight databases included in the Conserved Domains Database, respectively. Bowtie2 is used to align reads to metagenomes. Protein identification and quantification can be performed with either SearchCLI coupled to PeptideShaker (performing peptide-to-spectrum matching and spectra count) or using MaxQuant (with quantification at the MS1 level). Differential gene expression analysis is performed with DESeq2, and heatmaps, volcano plots and PCA plots are generated. The expressed enzymes are plotted into hundreds of KEGG metabolic maps with the tool KEGGCharter, showing the metabolic functions that are differentially expressed and the taxonomic assignment. Tables, heatmaps and other representations obtained with MOSCA provide an interactive, accessible and comprehensive representation of the information obtained from MG, MT and MP analyses. Conclusions: MOSCA performs automatic analyses of MG, MT and MP datasets, integrating over 20 tools to obtain a comprehensive and easy to understand representation of microbial activity in different processes and conditions.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-09
2022-05-09T00:00:00Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/79258
url https://hdl.handle.net/1822/79258
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Sequeira, J. C.; Rocha, Miguel; Alves, M. Madalena; Salvador, Andreia F., Improving analysis of meta-omics data with the MOSCA framework. ICBM 2022 - 4th International Conference on Biogas Microbiology. No. OC-OM-06, Braga, Portugal, May 9-11, 44, 2022.
https://www.ceb.uminho.pt/Events/Details/4296
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833594948945969152