A commodity platform for Distributed Data Mining - the HARVARD System

Bibliographic Details
Main Author: Ruy Ramos
Publication Date: 2006
Other Authors: Rui Camacho, Pedro Souto
Format: Book
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://repositorio-aberto.up.pt/handle/10216/73310
Summary: Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.
id RCAP_b96626d7978d8b146da15d269a2c0e70
oai_identifier_str oai:repositorio-aberto.up.pt:10216/73310
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling A commodity platform for Distributed Data Mining - the HARVARD SystemEngenharia de computadores, Engenharia electrotécnica, electrónica e informáticaComputer engineering, Electrical engineering, Electronic engineering, Information engineeringSystems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.20062006-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/73310engRuy RamosRui CamachoPedro Soutoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-27T18:47:55Zoai:repositorio-aberto.up.pt:10216/73310Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T22:58:52.341662Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv A commodity platform for Distributed Data Mining - the HARVARD System
title A commodity platform for Distributed Data Mining - the HARVARD System
spellingShingle A commodity platform for Distributed Data Mining - the HARVARD System
Ruy Ramos
Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
title_short A commodity platform for Distributed Data Mining - the HARVARD System
title_full A commodity platform for Distributed Data Mining - the HARVARD System
title_fullStr A commodity platform for Distributed Data Mining - the HARVARD System
title_full_unstemmed A commodity platform for Distributed Data Mining - the HARVARD System
title_sort A commodity platform for Distributed Data Mining - the HARVARD System
author Ruy Ramos
author_facet Ruy Ramos
Rui Camacho
Pedro Souto
author_role author
author2 Rui Camacho
Pedro Souto
author2_role author
author
dc.contributor.author.fl_str_mv Ruy Ramos
Rui Camacho
Pedro Souto
dc.subject.por.fl_str_mv Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
topic Engenharia de computadores, Engenharia electrotécnica, electrónica e informática
Computer engineering, Electrical engineering, Electronic engineering, Information engineering
description Systems performing Data Mining analysis are usually dedicated and expensive. They often require special purpose machines to run the data analysis tool. In this paper we propose an architecture for distributed Data Mining running on general purpose desktop computers. The proposed architecture was deployed in the HARVesting Architecture of idle machines foR Data mining (HARVARD) system.The Harvard system has the following features. Does not require specialpurpose or expensive machines as it runs in general purpose PCs. It isbased on distributed computing using a set of PCs connected in a network. In a Condor fashion it takes advantage of a distributed setting of available and idle computational resources and is adequate for problems that may be decomposed into coarse grain subtasks. The system includes a dynamic updating of the computational resources. It is written in Java and therefore runs on several dierent platforms that include Linux and Windows. It has fault-tolerant features that make it quite reliable. It may use a wide variety of data analysis tools without modication since it is independent of the data analysis tool. It uses a easy but powerful task specication and control language.The HARVARD system was deployed using two data analysis tools. ADecision tree tool called C4.5 and an Inductive Logic Programming (ILP)tool.
publishDate 2006
dc.date.none.fl_str_mv 2006
2006-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/book
format book
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio-aberto.up.pt/handle/10216/73310
url https://repositorio-aberto.up.pt/handle/10216/73310
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833599958669852672