Machine Learning for Adaptive Multi-Core Machines

Detalhes bibliográficos
Autor(a) principal: Lopes, Noel
Data de Publicação: 2014
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: https://hdl.handle.net/10316/23773
Resumo: Tese de doutoramento em Engenharia Informática, apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra
id RCAP_c810b86d66af701d4f8baaad35de893e
oai_identifier_str oai:estudogeral.uc.pt:10316/23773
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Machine Learning for Adaptive Multi-Core MachinesGPU computingMachine learningTese de doutoramento em Engenharia Informática, apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraToday, the increasing complexity, performance requirements and cost of current (and future) applications in society is transversal to a wide range of activities, from science to industry. The scale of the data from Web growth and advances in sensor data collection technology have been rapidly increasing the magnitude and complexity of tasks that Machine Learning (ML) algorithms have to solve. This growth is driving the need to extend the applicability of existing ML algorithms to larger datasets and to devise parallel algorithms that scale well with the volume of data or, in other words, can handle “Big Data”. In this Thesis, we partly contribute to solving this problem, by making use of two complementary components: a body of novel ML algorithms and a set of high-performance ML parallel implementations for adaptive multi-core machines. In the first component, a new adaptive step size technique that enhances the convergence of Restricted Boltzmann Machines (RBMs), thereby effectively decreasing the training time of Deep Belief Networks (DBNs), is presented. Also, a novel Semi-Supervised Non-Negative Matrix Factorization (SSNMF) algorithm, aiming at extracting the most discriminating characteristics of each class, while reducing substantially the overall time required for generating the models, is proposed. In addition, a novel Incremental Hypersphere Classifier (IHC) with built-in multi-class support, which is able to accommodate memory and computational restrictions while providing good classification performance, is presented. This highly-scalable algorithm can update models and classify new data in real-time as well as handle concept drift scenarios. Moreover, since it keeps the samples that are near the decision frontier while removing noisy and less relevant ones, it can select a representative subset of the data for applying more sophisticated algorithms in a fraction of the time required for the complete dataset. A learning framework (IHC-SVM), encompassing the IHC and Support Vector Machine (SVM) algorithms is validated in a real-world case study of protein membership prediction. Overall the resulting system proved to be able to excel the baseline SVM (with an F-measure of 96.39%) using only a subset of the data (ca. 50%) and demonstrated its capacity to deal with the everyday dynamic changes of real-world biological databases. In another direction, and motivated by the need to deal with missing data often occurring in large-scale data, a novel solution, designated by Neural Selective Input Model (NSIM), is proposed. The method empowers Neural Networks (NNs) with the ability to handle Missing Values (MVs) and excels single imputation techniques while offering better or similar classification performance than the state-of-the-art multiple imputation methods. With the new methodology we have successfully addressed a real-world case study of bankruptcy prediction in a large dataset of French companies, with results (F-measure of 95.70%) that are superior to previous approaches. The backbone of the second component of this Thesis is a Graphics Processing Unit (GPU) computational framework, named GPU Machine Learning Library (GPUMLib), which aims at providing the building blocks for developing high-performance GPU parallel ML software, promote cooperation within the field and contribute to the development of innovative applications. The rationale consists of taking advantage of the GPU high-throughput parallel architecture to expand the scalability of supervised, semi-supervised and unsupervised ML algorithms. Since its release, GPUMLib, now with over 2, 000 downloads, has benefited researchers worldwide. New GPU parallel implementations of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) supervised algorithms, integrating the NSIM, are presented, providing significant speedups (up to 180×). In particular, these implementations played an important role for the detection of Ventricular Arrhythmias (VAs) (with a sensitivity of 98.07%) that improved previous work, by reducing the computation time from weeks to hours. In this line, an Autonomous Training System (ATS) is designed to automatically find GPU high-quality solutions. In the unsupervised verge, a GPU parallel implementation of the CD–k algorithm, which boosts considerably the RBMs and DBNs training speed, is presented, achieving speedups up to 46×. Additionally, new GPU parallel implementations of the Non-Negative Matrix Factorization (NMF) algorithm are presented, yielding speedups up to 706×. Both unsupervised implementations are tested in benchmarks and in real datasets. Overall, this Thesis contributes with adaptive multi-core machines for exploring “Big Data”, which – as we hope – will have a positive impact in solving otherwise intractable ML problems.2014-01-27doctoral thesisinfo:eu-repo/semantics/publishedVersionLOPES, Noel de Jesus Mendonça - Machine learning for adaptive multi-core machines. Coimbra : [s.n.], 2013. Tese de doutoramento. Disponível na WWW: http://hdl.handle.net/10316/23773https://hdl.handle.net/10316/23773LOPES, Noel de Jesus Mendonça - Machine learning for adaptive multi-core machines. Coimbra : [s.n.], 2013. Tese de doutoramento. Disponível na WWW: http://hdl.handle.net/10316/23773https://hdl.handle.net/10316/23773TID:101277237engLopes, Noelinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2020-05-25T12:23:35Zoai:estudogeral.uc.pt:10316/23773Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T05:19:24.188152Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Machine Learning for Adaptive Multi-Core Machines
title Machine Learning for Adaptive Multi-Core Machines
spellingShingle Machine Learning for Adaptive Multi-Core Machines
Lopes, Noel
GPU computing
Machine learning
title_short Machine Learning for Adaptive Multi-Core Machines
title_full Machine Learning for Adaptive Multi-Core Machines
title_fullStr Machine Learning for Adaptive Multi-Core Machines
title_full_unstemmed Machine Learning for Adaptive Multi-Core Machines
title_sort Machine Learning for Adaptive Multi-Core Machines
author Lopes, Noel
author_facet Lopes, Noel
author_role author
dc.contributor.author.fl_str_mv Lopes, Noel
dc.subject.por.fl_str_mv GPU computing
Machine learning
topic GPU computing
Machine learning
description Tese de doutoramento em Engenharia Informática, apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra
publishDate 2014
dc.date.none.fl_str_mv 2014-01-27
dc.type.driver.fl_str_mv doctoral thesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv LOPES, Noel de Jesus Mendonça - Machine learning for adaptive multi-core machines. Coimbra : [s.n.], 2013. Tese de doutoramento. Disponível na WWW: http://hdl.handle.net/10316/23773
https://hdl.handle.net/10316/23773
LOPES, Noel de Jesus Mendonça - Machine learning for adaptive multi-core machines. Coimbra : [s.n.], 2013. Tese de doutoramento. Disponível na WWW: http://hdl.handle.net/10316/23773
https://hdl.handle.net/10316/23773
TID:101277237
identifier_str_mv LOPES, Noel de Jesus Mendonça - Machine learning for adaptive multi-core machines. Coimbra : [s.n.], 2013. Tese de doutoramento. Disponível na WWW: http://hdl.handle.net/10316/23773
TID:101277237
url https://hdl.handle.net/10316/23773
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602319066857472