Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database

Bibliographic Details
Main Author: Quaresma, Andreia
Publication Date: 2022
Other Authors: Keller, Alexander, Rufino, José, Steen, Jozef van der, Pinto, M. Alice
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10198/28341
Summary: A powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank.
id RCAP_c6ead27b9b34cfa8902ce440b78eea9d
oai_identifier_str oai:bibliotecadigital.ipb.pt:10198/28341
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference databasePollen DNA metabarcodingITS2 database curationHoneybeeA powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank.University of Belgrade, Faculty of BiologyBiblioteca Digital do IPBQuaresma, AndreiaKeller, AlexanderRufino, JoséSteen, Jozef van derPinto, M. Alice2023-05-22T11:14:11Z20222022-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10198/28341engQuaresma, Andreia; Keller, Alexander; Rufino, José; Van der Steen, Jozef; Pinto, M. Alice (2022). Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database. In EurBee 9 - 9th European Congress of Apidology. Belgrade, Serbia978-86-7078-173-3info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-25T12:19:43Zoai:bibliotecadigital.ipb.pt:10198/28341Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:47:15.989545Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
title Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
spellingShingle Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
Quaresma, Andreia
Pollen DNA metabarcoding
ITS2 database curation
Honeybee
title_short Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
title_full Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
title_fullStr Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
title_full_unstemmed Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
title_sort Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
author Quaresma, Andreia
author_facet Quaresma, Andreia
Keller, Alexander
Rufino, José
Steen, Jozef van der
Pinto, M. Alice
author_role author
author2 Keller, Alexander
Rufino, José
Steen, Jozef van der
Pinto, M. Alice
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Biblioteca Digital do IPB
dc.contributor.author.fl_str_mv Quaresma, Andreia
Keller, Alexander
Rufino, José
Steen, Jozef van der
Pinto, M. Alice
dc.subject.por.fl_str_mv Pollen DNA metabarcoding
ITS2 database curation
Honeybee
topic Pollen DNA metabarcoding
ITS2 database curation
Honeybee
description A powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank.
publishDate 2022
dc.date.none.fl_str_mv 2022
2022-01-01T00:00:00Z
2023-05-22T11:14:11Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10198/28341
url http://hdl.handle.net/10198/28341
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Quaresma, Andreia; Keller, Alexander; Rufino, José; Van der Steen, Jozef; Pinto, M. Alice (2022). Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database. In EurBee 9 - 9th European Congress of Apidology. Belgrade, Serbia
978-86-7078-173-3
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv University of Belgrade, Faculty of Biology
publisher.none.fl_str_mv University of Belgrade, Faculty of Biology
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833592236957237248