Distributed generative data mining

Bibliographic Details
Main Author: Ruy Ramos
Publication Date: 2007
Other Authors: Rui Camacho
Format: Book
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/10216/67392
Summary: A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.
id RCAP_32a2fc94a5e6d7f5447c8213e3c29f5b
oai_identifier_str oai:repositorio-aberto.up.pt:10216/67392
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Distributed generative data miningEngenharia de computadores, Engenharia electrotécnica, electrónica e informáticaComputer engineering, Electrical engineering, Electronic engineering, Information engineeringA process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.20072007-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://hdl.handle.net/10216/67392eng10.1007/978-3-540-73435-2_24Ruy RamosRui Camachoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-27T20:08:57Zoai:repositorio-aberto.up.pt:10216/67392Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T23:52:48.709504Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Distributed generative data mining
title Distributed generative data mining
spellingShingle Distributed generative data mining
Ruy Ramos
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
title_short Distributed generative data mining
title_full Distributed generative data mining
title_fullStr Distributed generative data mining
title_full_unstemmed Distributed generative data mining
title_sort Distributed generative data mining
author Ruy Ramos
author_facet Ruy Ramos
Rui Camacho
author_role author
author2 Rui Camacho
author2_role author
dc.contributor.author.fl_str_mv Ruy Ramos
Rui Camacho
dc.subject.por.fl_str_mv Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
topic Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
description A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.
publishDate 2007
dc.date.none.fl_str_mv 2007
2007-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/book
format book
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/67392
url https://hdl.handle.net/10216/67392
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1007/978-3-540-73435-2_24
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833600325092638720