Mining Extremes through Fuzzy Clustering
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2018 |
| Tipo de documento: | Dissertação |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10362/61679 |
Resumo: | Archetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results. |
| id |
RCAP_cf54ce23e97d14a8d004b41b49b98e69 |
|---|---|
| oai_identifier_str |
oai:run.unl.pt:10362/61679 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Mining Extremes through Fuzzy ClusteringArchetypal analysisFuzzy proportional membershipClustering data recoveryFuzzy data generatorFuzzy validation indicesDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaArchetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results.Nascimento, SusanaRUNMendes, Gonçalo Sancho de Queiroz de Moncada Sousa2019-02-26T10:56:25Z2018-1220182018-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/61679enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T17:37:26Zoai:run.unl.pt:10362/61679Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:08:24.781977Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Mining Extremes through Fuzzy Clustering |
| title |
Mining Extremes through Fuzzy Clustering |
| spellingShingle |
Mining Extremes through Fuzzy Clustering Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa Archetypal analysis Fuzzy proportional membership Clustering data recovery Fuzzy data generator Fuzzy validation indices Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
| title_short |
Mining Extremes through Fuzzy Clustering |
| title_full |
Mining Extremes through Fuzzy Clustering |
| title_fullStr |
Mining Extremes through Fuzzy Clustering |
| title_full_unstemmed |
Mining Extremes through Fuzzy Clustering |
| title_sort |
Mining Extremes through Fuzzy Clustering |
| author |
Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa |
| author_facet |
Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Nascimento, Susana RUN |
| dc.contributor.author.fl_str_mv |
Mendes, Gonçalo Sancho de Queiroz de Moncada Sousa |
| dc.subject.por.fl_str_mv |
Archetypal analysis Fuzzy proportional membership Clustering data recovery Fuzzy data generator Fuzzy validation indices Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
| topic |
Archetypal analysis Fuzzy proportional membership Clustering data recovery Fuzzy data generator Fuzzy validation indices Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
| description |
Archetypes are extreme points that synthesize data representing "pure" individual types. Archetypes are assigned by the most discriminating features of data points, and are almost always useful in applications when one is interested in extremes and not on commonalities. Recent applications include talent analysis in sports and science, fraud detection, profiling of users and products in recommendation systems, climate extremes, as well as other machine learning applications. The furthest-sum Archetypal Analysis (FS-AA) (Mørup and Hansen, 2012) and the Fuzzy Clustering with Proportional Membership (FCPM) (Nascimento, 2005) propose distinct models to find clusters with extreme prototypes. Even though the FCPM model does not impose its prototypes to lie in the convex hull of data, it belongs to the framework of data recovery from clustering (Mirkin, 2005), a powerful property for unsupervised cluster analysis. The baseline version of FCPM, FCPM-0, provides central prototypes whereas its smooth version, FCPM-2 provides extreme prototypes as AA archetypes. The comparative study between FS-AA and FCPM algorithms conducted in this dissertation covers the following aspects. First, the analysis of FS-AA on data recovery from clustering using a collection of 100 data sets of diverse dimensionalities, generated with a proper data generator (FCPM-DG) as well as 14 real world data. Second, testing the robustness of the clustering algorithms in the presence of outliers, with the peculiar behaviour of FCPM-0 on removing the proper number of prototypes from data. Third, a collection of five popular fuzzy validation indices are explored on accessing the quality of clustering results. Forth, the algorithms undergo a study to evaluate how different initializations affect their convergence as well as the quality of the clustering partitions. The Iterative Anomalous Pattern (IAP) algorithm allows to improve the convergence of FCPM algorithm as well as to fine-tune the level of resolution to look at clustering results, which is an advantage from FS-AA. Proper visualization functionalities for FS-AA and FCPM support the easy interpretation of the clustering results. |
| publishDate |
2018 |
| dc.date.none.fl_str_mv |
2018-12 2018 2018-12-01T00:00:00Z 2019-02-26T10:56:25Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/61679 |
| url |
http://hdl.handle.net/10362/61679 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833596464489562112 |