Distributed generative data mining
| Main Author: | |
|---|---|
| Publication Date: | 2007 |
| Other Authors: | |
| Format: | Book |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | https://hdl.handle.net/10216/67392 |
Summary: | A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records. |
| id |
RCAP_32a2fc94a5e6d7f5447c8213e3c29f5b |
|---|---|
| oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/67392 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Distributed generative data miningEngenharia de computadores, Engenharia electrotécnica, electrónica e informáticaComputer engineering, Electrical engineering, Electronic engineering, Information engineeringA process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.20072007-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://hdl.handle.net/10216/67392eng10.1007/978-3-540-73435-2_24Ruy RamosRui Camachoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-27T20:08:57Zoai:repositorio-aberto.up.pt:10216/67392Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T23:52:48.709504Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Distributed generative data mining |
| title |
Distributed generative data mining |
| spellingShingle |
Distributed generative data mining Ruy Ramos Engenharia de computadores, Engenharia electrotécnica, electrónica e informática Computer engineering, Electrical engineering, Electronic engineering, Information engineering |
| title_short |
Distributed generative data mining |
| title_full |
Distributed generative data mining |
| title_fullStr |
Distributed generative data mining |
| title_full_unstemmed |
Distributed generative data mining |
| title_sort |
Distributed generative data mining |
| author |
Ruy Ramos |
| author_facet |
Ruy Ramos Rui Camacho |
| author_role |
author |
| author2 |
Rui Camacho |
| author2_role |
author |
| dc.contributor.author.fl_str_mv |
Ruy Ramos Rui Camacho |
| dc.subject.por.fl_str_mv |
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática Computer engineering, Electrical engineering, Electronic engineering, Information engineering |
| topic |
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática Computer engineering, Electrical engineering, Electronic engineering, Information engineering |
| description |
A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records. |
| publishDate |
2007 |
| dc.date.none.fl_str_mv |
2007 2007-01-01T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/book |
| format |
book |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/67392 |
| url |
https://hdl.handle.net/10216/67392 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
10.1007/978-3-540-73435-2_24 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833600325092638720 |