Topic modelling: a consistent framework for comparative studies and its practical application

Bibliographic Details
Main Author: Amaro, Ana Margarida Rocha
Publication Date: 2022
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/144705
Summary: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
id RCAP_ae238f25a75ca6bce7b11e7567396a9f
oai_identifier_str oai:run.unl.pt:10362/144705
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Topic modelling: a consistent framework for comparative studies and its practical applicationNatural Language ProcessingTop2VecTopic CoherenceTopic ModellingUnsupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis research was part of the DSAIPA/DS/0116/2019 project, supported by a grant of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”).Topic Modelling (TM) is an unsupervised learning method to find latent semantic structure in a set of documents, grouping them according to their semantic content. Although in the literature there are several proposed algorithms for TM, these are commonly not validated against the same datasets and evaluation metrics. Simultaneously, current surveys found in the literature, rely on a reduced number of algorithms or, given the velocity of advances in the field, exclude models that have been presented with state-of-the-art results. Consequentially, in this work, we aim to present a more complete comparative study on the performance of different TM techniques, which shall be evaluated on three datasets, arising from different contexts: the 20 Newsgroup dataset, the Yahoo! Q&A dataset, and the BIG Patent dataset. The experiments, evaluated primarily through the Context Vectors (CV) Topic Coherence, indicate that Top2Vec is the best performing model across all datasets. Given the results obtained, an exploratory analysis was conducted on a newly introduced dataset, containing abstracts of articles funded by Central Banks and other international organizations. This endeavour is intended to provide an informative outlook on the organizations’ diverse topics of interest and their evolution over the period in study. In short, the major contribution of this work is to offer an updated survey on the state of art TM approaches, while demonstrating its practical usability in a new context, whilst exploring the insights obtained.Bação, Fernando José Ferreira LucasRUNAmaro, Ana Margarida Rocha2022-10-14T13:08:36Z2022-10-032022-10-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/144705TID:203076524enginfo:eu-repo/semantics/embargoedAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:05:59Zoai:run.unl.pt:10362/144705Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:36:42.713359Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Topic modelling: a consistent framework for comparative studies and its practical application
title Topic modelling: a consistent framework for comparative studies and its practical application
spellingShingle Topic modelling: a consistent framework for comparative studies and its practical application
Amaro, Ana Margarida Rocha
Natural Language Processing
Top2Vec
Topic Coherence
Topic Modelling
Unsupervised Learning
title_short Topic modelling: a consistent framework for comparative studies and its practical application
title_full Topic modelling: a consistent framework for comparative studies and its practical application
title_fullStr Topic modelling: a consistent framework for comparative studies and its practical application
title_full_unstemmed Topic modelling: a consistent framework for comparative studies and its practical application
title_sort Topic modelling: a consistent framework for comparative studies and its practical application
author Amaro, Ana Margarida Rocha
author_facet Amaro, Ana Margarida Rocha
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
RUN
dc.contributor.author.fl_str_mv Amaro, Ana Margarida Rocha
dc.subject.por.fl_str_mv Natural Language Processing
Top2Vec
Topic Coherence
Topic Modelling
Unsupervised Learning
topic Natural Language Processing
Top2Vec
Topic Coherence
Topic Modelling
Unsupervised Learning
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
publishDate 2022
dc.date.none.fl_str_mv 2022-10-14T13:08:36Z
2022-10-03
2022-10-03T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/144705
TID:203076524
url http://hdl.handle.net/10362/144705
identifier_str_mv TID:203076524
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833596830040981504