Análise do método suvrel na expressão diferencial a partir da matriz de contagens gerada com dados de RNA-SEQ

Detalhes bibliográficos
Ano de defesa: 2014
Autor(a) principal: Tambonis, Tiago [UNESP]
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Estadual Paulista (Unesp)
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/11449/127589
Resumo: We are living in a time where advances in areas related to biology are routine, taking us to accustom to experiments with large number of variables. The RNA sequencing technology (RNA-Seq) is part of this framework and computational approaches applied in this context are not fully established and require more detailed analysis. Generally, in a experiment of analysis of di erential expression, total RNA samples or messengers (mRNA) is extracted, puri ed, fragmented, sequenced, mapped, and nally counted, generating an count table that relates how many reads was aligned to a given gene in a experimental condition. From this stage, it is proposed to use a variational method, called Suvrel (Supervised Variational Relevance), based on the minimization of a cost function that penalizes large distances between the same class of elements and favors small distances between di erent classes of elements to make the inference of relevance of each gene. The application of the method was performed on count table produced after of sequencing, alignment and summarization of 5 technical replicates containing Strategene Universal Human Reference RNA (UHRR) (part of Sequencing Quality Control Consortium, SEQC) together with ERCC 1 mix, and 5 technical replicates containing Ambion's Human Brain Reference RNA (HBRR) (part of SEQC also) together with the ERCC 2 mix. Using the ROC (Receiver Operating characteristic) curves generating from data of MAC-II project, setting the transcripts with log of fold-change greater than a cuto (from 0.5 to 2.0) as true positive and the others as true negative, the curves 6.2 and 6.4 were generated. From these graphs it is possible to conclude that the Suvrel method has higher AUCs in most of cuto s. It is appropriate to note that conclusions were obtained using a method that does not make any assumption about the distribution associated with the reads, using a simple normalization (divide the counts of a gene by its standard ...