Topic modelling: a consistent framework for comparative studies and its practical application
| Main Author: | |
|---|---|
| Publication Date: | 2022 |
| Format: | Master thesis |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10362/144705 |
Summary: | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
| id |
RCAP_ae238f25a75ca6bce7b11e7567396a9f |
|---|---|
| oai_identifier_str |
oai:run.unl.pt:10362/144705 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Topic modelling: a consistent framework for comparative studies and its practical applicationNatural Language ProcessingTop2VecTopic CoherenceTopic ModellingUnsupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis research was part of the DSAIPA/DS/0116/2019 project, supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”).Topic Modelling (TM) is an unsupervised learning method to find latent semantic structure in a set of documents, grouping them according to their semantic content. Although in the literature there are several proposed algorithms for TM, these are commonly not validated against the same datasets and evaluation metrics. Simultaneously, current surveys found in the literature, rely on a reduced number of algorithms or, given the velocity of advances in the field, exclude models that have been presented with state-of-the-art results. Consequentially, in this work, we aim to present a more complete comparative study on the performance of different TM techniques, which shall be evaluated on three datasets, arising from different contexts: the 20 Newsgroup dataset, the Yahoo! Q&A dataset, and the BIG Patent dataset. The experiments, evaluated primarily through the Context Vectors (CV) Topic Coherence, indicate that Top2Vec is the best performing model across all datasets. Given the results obtained, an exploratory analysis was conducted on a newly introduced dataset, containing abstracts of articles funded by Central Banks and other international organizations. This endeavour is intended to provide an informative outlook on the organizations’ diverse topics of interest and their evolution over the period in study. In short, the major contribution of this work is to offer an updated survey on the state of art TM approaches, while demonstrating its practical usability in a new context, whilst exploring the insights obtained.Bação, Fernando José Ferreira LucasRUNAmaro, Ana Margarida Rocha2022-10-14T13:08:36Z2022-10-032022-10-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/144705TID:203076524enginfo:eu-repo/semantics/embargoedAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:05:59Zoai:run.unl.pt:10362/144705Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:36:42.713359Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Topic modelling: a consistent framework for comparative studies and its practical application |
| title |
Topic modelling: a consistent framework for comparative studies and its practical application |
| spellingShingle |
Topic modelling: a consistent framework for comparative studies and its practical application Amaro, Ana Margarida Rocha Natural Language Processing Top2Vec Topic Coherence Topic Modelling Unsupervised Learning |
| title_short |
Topic modelling: a consistent framework for comparative studies and its practical application |
| title_full |
Topic modelling: a consistent framework for comparative studies and its practical application |
| title_fullStr |
Topic modelling: a consistent framework for comparative studies and its practical application |
| title_full_unstemmed |
Topic modelling: a consistent framework for comparative studies and its practical application |
| title_sort |
Topic modelling: a consistent framework for comparative studies and its practical application |
| author |
Amaro, Ana Margarida Rocha |
| author_facet |
Amaro, Ana Margarida Rocha |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Bação, Fernando José Ferreira Lucas RUN |
| dc.contributor.author.fl_str_mv |
Amaro, Ana Margarida Rocha |
| dc.subject.por.fl_str_mv |
Natural Language Processing Top2Vec Topic Coherence Topic Modelling Unsupervised Learning |
| topic |
Natural Language Processing Top2Vec Topic Coherence Topic Modelling Unsupervised Learning |
| description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
| publishDate |
2022 |
| dc.date.none.fl_str_mv |
2022-10-14T13:08:36Z 2022-10-03 2022-10-03T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/144705 TID:203076524 |
| url |
http://hdl.handle.net/10362/144705 |
| identifier_str_mv |
TID:203076524 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/embargoedAccess |
| eu_rights_str_mv |
embargoedAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833596830040981504 |