Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
Main Author: | |
---|---|
Publication Date: | 2022 |
Format: | Master thesis |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10362/140849 |
Summary: | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
id |
RCAP_33e865eb4ef9f49821ed060579a7e07f |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/140849 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analyticsNatural Language ProcessingTransformer ArchitectureUMAPInformation EncounteringInformation RetrievalCompetitive IntelligenceSentence EmbeddingsTopic ModelingUnsupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsCompetitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mostly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from large collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The searching module of the system uses a retriever and re-ranker engine that first finds the closest neighbors to the query embedding, and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing or visualization module also leverages the embeddings by projecting them onto 2 dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. We evaluate the system and its components on the 20 newsgroups dataset, making use of the semantic document labels provided, and we demonstrate the superiority of Transformer-based components. Finally, we present a prototype of the system in Python and show how some of its features can be used to acquire intelligence from a news article corpus we collected during a period of 8 months.Bação, Fernando José Ferreira LucasRUNSilva, David Fontes Henriques Silvestre da2022-06-27T13:34:57Z2022-05-182022-05-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/140849TID:203028414enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:02:51Zoai:run.unl.pt:10362/140849Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:33:40.054125Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
title |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
spellingShingle |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics Silva, David Fontes Henriques Silvestre da Natural Language Processing Transformer Architecture UMAP Information Encountering Information Retrieval Competitive Intelligence Sentence Embeddings Topic Modeling Unsupervised Learning |
title_short |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
title_full |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
title_fullStr |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
title_full_unstemmed |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
title_sort |
Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics |
author |
Silva, David Fontes Henriques Silvestre da |
author_facet |
Silva, David Fontes Henriques Silvestre da |
author_role |
author |
dc.contributor.none.fl_str_mv |
Bação, Fernando José Ferreira Lucas RUN |
dc.contributor.author.fl_str_mv |
Silva, David Fontes Henriques Silvestre da |
dc.subject.por.fl_str_mv |
Natural Language Processing Transformer Architecture UMAP Information Encountering Information Retrieval Competitive Intelligence Sentence Embeddings Topic Modeling Unsupervised Learning |
topic |
Natural Language Processing Transformer Architecture UMAP Information Encountering Information Retrieval Competitive Intelligence Sentence Embeddings Topic Modeling Unsupervised Learning |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-06-27T13:34:57Z 2022-05-18 2022-05-18T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/140849 TID:203028414 |
url |
http://hdl.handle.net/10362/140849 |
identifier_str_mv |
TID:203028414 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833596794019250176 |