Análise de RNAs longos não codificantes do genoma de Arabidopsis thaliana (L.) Heynh

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Araújo, Vanessa Cristina da Silva lattes
Orientador(a): Novaes, Evandro lattes
Banca de defesa: Novaes, Evandro, Vianello, Rosana Pereira, Coelho, Alexandre Siqueira Guedes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Goiás
Programa de Pós-Graduação: Programa de Pós-graduação em Genética e Biologia Molecular
Departamento: Instituto de Ciências Biológicas - ICB (RG)
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://repositorio.bc.ufg.br/tede/handle/tede/7249
Resumo: Large-scale sequencing of transcripts via RNA-Seq has been changing paradigms by demonstrating that transcription is prevalent throughout the eukaryotic genome. In these organisms, the vast majority of transcripts are non-coding (ncRNA). One type of RNA that has aroused great interest, given its prevalence, is long non-coding RNAs (lncRNAs), which are ncRNA with more than 200 nucleotides. However, little is known about the role and prevalence of these lncRNAs in plant genomes, even in model species such as Arabidopsis thaliana (L.) Heynh. The objective of this work was to identify lncRNAs in the Arabidopsis genome and to characterize their size, structure and nucleotide diversity. The sequences were obtained from previous work that sequenced total RNA from A. thaliana, grown under different light regimes, using Illumina Hiseq 2000 platform. These sequences were mapped into the reference genome with TopHat and assembled with Cufflinks. The assembled transcripts were compared with the genome annotation with Cuffcompare, to identify non-annotated transcripts. A total of 4,305 long putative RNAs were obtained, with 314 (7%) sense in relation to coding transcripts (mRNAs), 392 (9%) intergenic, 2,216 intronic (52%) and 1,383 (32%) antisense mRNAs. The lncRNAs obtained were filtered to eliminate those with coding potential, as well as those related to rRNA, tRNA and miRNA synthesis. A total of 3,710 high-confidence lncRNAs (HC-lncRNA) were obtained, of which 58.6% were not previously annotated. These HC-lncRNA emcompass a low proportion (~ 1%) lncRNAs in the genome of Arabidopsis thaliana. A functional enrichment analysis of Gene Ontology (GO) categories demonstrated that among genes containing lncRNAs there is a high proportion of categories linked to the localization and transport of proteins within the cell, as well as to nucleic acid binding. A gene expression analyses identified only 22 differentially expressed lncRNAs under the different light conditions in which samples were exposed. Using the SNP data from the 1001 genomes project, identified high nucleotide diversity within lncRNAs regions, indicating low conservation of the primary structure of these transcripts. The nucleotide diversity in regions of long noncoding RNAs is lower than in coding regions, but less than a diversity observed in neutral regions such as pseudogenes.