Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database
| Main Author: | |
|---|---|
| Publication Date: | 2022 |
| Other Authors: | , , , |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10198/28341 |
Summary: | A powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank. |
| id |
RCAP_c6ead27b9b34cfa8902ce440b78eea9d |
|---|---|
| oai_identifier_str |
oai:bibliotecadigital.ipb.pt:10198/28341 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference databasePollen DNA metabarcodingITS2 database curationHoneybeeA powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank.University of Belgrade, Faculty of BiologyBiblioteca Digital do IPBQuaresma, AndreiaKeller, AlexanderRufino, JoséSteen, Jozef van derPinto, M. Alice2023-05-22T11:14:11Z20222022-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10198/28341engQuaresma, Andreia; Keller, Alexander; Rufino, José; Van der Steen, Jozef; Pinto, M. Alice (2022). Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database. In EurBee 9 - 9th European Congress of Apidology. Belgrade, Serbia978-86-7078-173-3info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-25T12:19:43Zoai:bibliotecadigital.ipb.pt:10198/28341Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:47:15.989545Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| title |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| spellingShingle |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database Quaresma, Andreia Pollen DNA metabarcoding ITS2 database curation Honeybee |
| title_short |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| title_full |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| title_fullStr |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| title_full_unstemmed |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| title_sort |
Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database |
| author |
Quaresma, Andreia |
| author_facet |
Quaresma, Andreia Keller, Alexander Rufino, José Steen, Jozef van der Pinto, M. Alice |
| author_role |
author |
| author2 |
Keller, Alexander Rufino, José Steen, Jozef van der Pinto, M. Alice |
| author2_role |
author author author author |
| dc.contributor.none.fl_str_mv |
Biblioteca Digital do IPB |
| dc.contributor.author.fl_str_mv |
Quaresma, Andreia Keller, Alexander Rufino, José Steen, Jozef van der Pinto, M. Alice |
| dc.subject.por.fl_str_mv |
Pollen DNA metabarcoding ITS2 database curation Honeybee |
| topic |
Pollen DNA metabarcoding ITS2 database curation Honeybee |
| description |
A powerful way of studying the quality of the environment is by examining the pollen collected by honey bees (Apis mellifera) as it contains information on available plant sources, spatial and temporal floral diversity, as well as on chemical contaminants. This entails botanical identification of pollen which has typically been addressed by classical palynology, a costly approach that often provides low taxonomic resolution, is time-consuming, labour intensive, and requires plant taxonomy expertise. However, with high-throughput sequencing becoming increasingly affordable, pollen metabarcoding is gaining momentum, and it is a promising alternative to classical palynology. But one of the main drawbacks of pollen metabarcoding is the lack of good quality reference databases for the barcode of choice. BCdatabaser (Keller et al. 2020) was developed to automatically generate a standardized database for the ITS2 barcode from the primary sequence database GenBank. While using BCdatabaser to construct an ITS2 reference database for identification of bee-collected pollen, we noticed several misidentified sequences retrieved from GenBank, which would impact identification accuracy. There were two types of problems: plant sequences that were assigned to the wrong plant species and fungi sequences that were identified as plants. To overcome these issues, we developed scripts in bash and R to curate an ITS2 reference database for pollen identification purposes. These scripts allowed us to identify the Fungi sequences retrieved from GenBank for subsequent removal from the database, to perform a pairwise alignment of all the sequences using vsearch v2.14.1 (Rognes et al., 2016) and, then to remove all the sequences with low identity percentage using an iteration process in R v4.1.2. The database curation is automated therefore enabling easy update of the ITS2 database to take advantage of the new sequences that are regularly deposited in GenBank. |
| publishDate |
2022 |
| dc.date.none.fl_str_mv |
2022 2022-01-01T00:00:00Z 2023-05-22T11:14:11Z |
| dc.type.driver.fl_str_mv |
conference object |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10198/28341 |
| url |
http://hdl.handle.net/10198/28341 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
Quaresma, Andreia; Keller, Alexander; Rufino, José; Van der Steen, Jozef; Pinto, M. Alice (2022). Pollen identification by its2 metabarcoding: curation of the sequences retrieved from genbank to build a reference database. In EurBee 9 - 9th European Congress of Apidology. Belgrade, Serbia 978-86-7078-173-3 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
University of Belgrade, Faculty of Biology |
| publisher.none.fl_str_mv |
University of Belgrade, Faculty of Biology |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833592236957237248 |