MIDB: um modelo de integração de dados biológicos

Perlin, Caroline Beatriz

MIDB: um modelo de integração de dados biológicos

Detalhes bibliográficos
Ano de defesa:	2012
Autor(a) principal:	Perlin, Caroline Beatriz
Orientador(a):	Ciferri, Ricardo Rodrigues
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	BR
Palavras-chave em Português:	Banco de dados Bioinformática Modelo de integração de dados Integração de esquemas Integração de instâncias Integração de Dados Biológicos
Palavras-chave em Inglês:	Bioinformatics Biological Databases Biological Database Integration Data Integration Model Schema Integration Instance Integration
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/ufscar/497
Resumo:	In bioinformatics, there is a huge volume of data related to biomolecules and to nucleotide and amino acid sequences that reside (in almost their totality) in several Biological Data Bases (BDBs). For a specific sequence, there are some informational classifications: genomic data, evolution-data, structural data, and others. Some BDBs store just one or some of these classifications. Those BDBs are hosted in different sites and servers, with several data base management systems with different data models. Besides, instances and schema might have semantic heterogeneity. In such scenario, the objective of this project is to propose a biological data integration model, that adopts new schema integration and instance integration techniques. The proposed integration model has a special mechanism of schema integration and another mechanism that performs the instance integration (with support of a dictionary) allowing conflict resolution in the attribute values; and a Clustering Algorithm is used in order to cluster similar entities. Besides, a domain specialist participates managing those clusters. The proposed model was validated through a study case focusing on schema and instance integration about nucleotide sequence data from organisms of Actinomyces gender, captured from four different data sources. The result is that about 97.91% of the attributes were correctly categorized in the schema integration, and the instance integration was able to identify that about 50% of the clusters created need support from a specialist, avoiding errors on the instance resolution. Besides, some contributions are presented, as the Attributes Categorization, the Clustering Algorithm, the distance functions proposed and the proposed model itself.

MIDB: um modelo de integração de dados biológicos

Registros relacionados