An MML embedded approach for estimating the number of clusters

Bibliographic Details
Main Author: Silvestre, Cláudia
Publication Date: 2023
Other Authors: Cardoso, Maria Margarida G. M. S., Figueiredo, Mário
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.21/16694
Summary: Assuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability
id RCAP_9497d70c296e9c13e60cef14d1d70511
oai_identifier_str oai:repositorio.ipl.pt:10400.21/16694
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling An MML embedded approach for estimating the number of clustersFinite mixture modelEM algorithmModel selectionMinimum message lengthCategorical dataAssuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretabilitySpringerRCIPLSilvestre, CláudiaCardoso, Maria Margarida G. M. S.Figueiredo, Mário2023-12-13T11:45:56Z2023-12-082023-12-08T00:00:00Zbook partinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10400.21/16694eng978-3-031-09034-9978-3-031-09033-2 (print)https://doi.org/10.1007/978-3-031-09034-9_38info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-12T07:30:54Zoai:repositorio.ipl.pt:10400.21/16694Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T19:50:14.655513Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv An MML embedded approach for estimating the number of clusters
title An MML embedded approach for estimating the number of clusters
spellingShingle An MML embedded approach for estimating the number of clusters
Silvestre, Cláudia
Finite mixture model
EM algorithm
Model selection
Minimum message length
Categorical data
title_short An MML embedded approach for estimating the number of clusters
title_full An MML embedded approach for estimating the number of clusters
title_fullStr An MML embedded approach for estimating the number of clusters
title_full_unstemmed An MML embedded approach for estimating the number of clusters
title_sort An MML embedded approach for estimating the number of clusters
author Silvestre, Cláudia
author_facet Silvestre, Cláudia
Cardoso, Maria Margarida G. M. S.
Figueiredo, Mário
author_role author
author2 Cardoso, Maria Margarida G. M. S.
Figueiredo, Mário
author2_role author
author
dc.contributor.none.fl_str_mv RCIPL
dc.contributor.author.fl_str_mv Silvestre, Cláudia
Cardoso, Maria Margarida G. M. S.
Figueiredo, Mário
dc.subject.por.fl_str_mv Finite mixture model
EM algorithm
Model selection
Minimum message length
Categorical data
topic Finite mixture model
EM algorithm
Model selection
Minimum message length
Categorical data
description Assuming that the data originate from a finite mixture of multinomial distributions, we study the performance of an integrated Expectation Maximization (EM) algorithm considering Minimum Message Length (MML) criterion to select the number of mixture components. The referred EM-MML approach, rather than selecting one among a set of pre-estimated candidate models (which requires running EM several times), seamlessly integrates estimation and model selection in a single algorithm. Comparisons are provided with EM combined with well-known information criteria – e.g. the Bayesian information Criterion. We resort to synthetic data examples and a real application. The EM-MML computation time is a clear advantage of this method; also, the real data solution it provides is more parsimonious, which reduces the risk of model order overestimation and improves interpretability
publishDate 2023
dc.date.none.fl_str_mv 2023-12-13T11:45:56Z
2023-12-08
2023-12-08T00:00:00Z
dc.type.driver.fl_str_mv book part
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.21/16694
url http://hdl.handle.net/10400.21/16694
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 978-3-031-09034-9
978-3-031-09033-2 (print)
https://doi.org/10.1007/978-3-031-09034-9_38
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833598349788315648