Uma Nova Amostragem de Descritores para Predição de Atividade Biológica

João Vitor Soares Tenório

Uma Nova Amostragem de Descritores para Predição de Atividade Biológica

Detalhes bibliográficos
Ano de defesa:	2018
Autor(a) principal:	João Vitor Soares Tenório
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais UFMG
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	QSAR Aprendizado de Máquina Quimiometria Aprendizado do Computador Bioinformática Computação QSAR (Bioquímica)
Link de acesso:	http://hdl.handle.net/1843/ESBF-BALHJU
Resumo:	Machine learning methods are being used to solve different problems in the areas of bioinformatics and chemometrics. One such problem is computer-aided drug design (CADD), which uses predictive modeling to design and improve compounds that have biological activity and can be used as drugs. One of the techniques used CADD is the study of quantitative structure-activity relationships (QSAR), which allows to develop a predictive model that relates the properties of the compounds and their biological activities, this model is typically a linear regression. LQTA-QSAR is a 4D-QSAR technique, where the descriptors used for predictive model training are sampled by aligning the conformational ensemble profiles (CEP) of the compounds in a 3D grid and calculating the interaction between the CEP and a probe (it can be an atom, ion, or functional group) in each point of this grid. The problem with this sampling is that the probe crosses the CEP, when the probe falls into or close to an atom of the CEP, some descriptors presents unrealistic values. To overcome this problem, a new approach for sampling descriptors was proposed in this thesis, which uses surface expansions defined by the convex hull to construct layers around the CEP where the probe must pass. This sampling prevents the probe from passing through the points inside or too close the CEP. To validate the proposal, several experiments were carried out on sets of compounds that can be used as drugs for the treatment of several diseases. The results showed that the proposal was able to build predictive models with greater precision than the original method in the six scenarios evaluated. The highest percentage increase was 44%. We also proposed a workflow where linear regression was replaced by regression tree, which allows to build models easier to interpret. Experiments with this new workflow were also carried out in six scenarios, where in one case the precision was superior to the linear models and in the other cases it was lower, but still satisfactory.

Uma Nova Amostragem de Descritores para Predição de Atividade Biológica

Registros relacionados