Mining Extremes through Fuzzy Clustering

Detalhes bibliográficos
Autor(a) principal: Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10362/61679
Resumo: Archetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results.
id RCAP_cf54ce23e97d14a8d004b41b49b98e69
oai_identifier_str oai:run.unl.pt:10362/61679
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Mining Extremes through Fuzzy ClusteringArchetypal analysisFuzzy proportional membershipClustering data recoveryFuzzy data generatorFuzzy validation indicesDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaArchetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results.Nascimento, SusanaRUNMendes, Gonçalo Sancho de Queiroz de Moncada Sousa2019-02-26T10:56:25Z2018-1220182018-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/61679enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T17:37:26Zoai:run.unl.pt:10362/61679Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:08:24.781977Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Mining Extremes through Fuzzy Clustering
title Mining Extremes through Fuzzy Clustering
spellingShingle Mining Extremes through Fuzzy Clustering
Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa
Archetypal analysis
Fuzzy proportional membership
Clustering data recovery
Fuzzy data generator
Fuzzy validation indices
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Mining Extremes through Fuzzy Clustering
title_full Mining Extremes through Fuzzy Clustering
title_fullStr Mining Extremes through Fuzzy Clustering
title_full_unstemmed Mining Extremes through Fuzzy Clustering
title_sort Mining Extremes through Fuzzy Clustering
author Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa
author_facet Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa
author_role author
dc.contributor.none.fl_str_mv Nascimento, Susana
RUN
dc.contributor.author.fl_str_mv Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa
dc.subject.por.fl_str_mv Archetypal analysis
Fuzzy proportional membership
Clustering data recovery
Fuzzy data generator
Fuzzy validation indices
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic Archetypal analysis
Fuzzy proportional membership
Clustering data recovery
Fuzzy data generator
Fuzzy validation indices
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description Archetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results.
publishDate 2018
dc.date.none.fl_str_mv 2018-12
2018
2018-12-01T00:00:00Z
2019-02-26T10:56:25Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/61679
url http://hdl.handle.net/10362/61679
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833596464489562112