An MML embedded approach for estimating the number of clusters
Main Author: | |
---|---|
Publication Date: | 2023 |
Other Authors: | , |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10400.21/16694 |
Summary: | Assuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability |
id |
RCAP_9497d70c296e9c13e60cef14d1d70511 |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/16694 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
An MML embedded approach for estimating the number of clustersFinite mixture modelEM algorithmModel selectionMinimum message lengthCategorical dataAssuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretabilitySpringerRCIPLSilvestre, CláudiaCardoso, Maria Margarida G. M. S.Figueiredo, Mário2023-12-13T11:45:56Z2023-12-082023-12-08T00:00:00Zbook partinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10400.21/16694eng978-3-031-09034-9978-3-031-09033-2 (print)https://doi.org/10.1007/978-3-031-09034-9_38info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-12T07:30:54Zoai:repositorio.ipl.pt:10400.21/16694Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T19:50:14.655513Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
An MML embedded approach for estimating the number of clusters |
title |
An MML embedded approach for estimating the number of clusters |
spellingShingle |
An MML embedded approach for estimating the number of clusters Silvestre, Cláudia Finite mixture model EM algorithm Model selection Minimum message length Categorical data |
title_short |
An MML embedded approach for estimating the number of clusters |
title_full |
An MML embedded approach for estimating the number of clusters |
title_fullStr |
An MML embedded approach for estimating the number of clusters |
title_full_unstemmed |
An MML embedded approach for estimating the number of clusters |
title_sort |
An MML embedded approach for estimating the number of clusters |
author |
Silvestre, Cláudia |
author_facet |
Silvestre, Cláudia Cardoso, Maria Margarida G. M. S. Figueiredo, Mário |
author_role |
author |
author2 |
Cardoso, Maria Margarida G. M. S. Figueiredo, Mário |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
RCIPL |
dc.contributor.author.fl_str_mv |
Silvestre, Cláudia Cardoso, Maria Margarida G. M. S. Figueiredo, Mário |
dc.subject.por.fl_str_mv |
Finite mixture model EM algorithm Model selection Minimum message length Categorical data |
topic |
Finite mixture model EM algorithm Model selection Minimum message length Categorical data |
description |
Assuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-12-13T11:45:56Z 2023-12-08 2023-12-08T00:00:00Z |
dc.type.driver.fl_str_mv |
book part |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/16694 |
url |
http://hdl.handle.net/10400.21/16694 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
978-3-031-09034-9 978-3-031-09033-2 (print) https://doi.org/10.1007/978-3-031-09034-9_38 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Springer |
publisher.none.fl_str_mv |
Springer |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833598349788315648 |