Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics

Bibliographic Details
Main Author: Silva, David Fontes Henriques Silvestre da
Publication Date: 2022
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/140849
Summary: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
id RCAP_33e865eb4ef9f49821ed060579a7e07f
oai_identifier_str oai:run.unl.pt:10362/140849
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analyticsNatural Language ProcessingTransformer ArchitectureUMAPInformation EncounteringInformation RetrievalCompetitive IntelligenceSentence EmbeddingsTopic ModelingUnsupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsCompetitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mostly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from large collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The searching module of the system uses a retriever and re-ranker engine that first finds the closest neighbors to the query embedding, and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing or visualization module also leverages the embeddings by projecting them onto 2 dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. We evaluate the system and its components on the 20 newsgroups dataset, making use of the semantic document labels provided, and we demonstrate the superiority of Transformer-based components. Finally, we present a prototype of the system in Python and show how some of its features can be used to acquire intelligence from a news article corpus we collected during a period of 8 months.Bação, Fernando José Ferreira LucasRUNSilva, David Fontes Henriques Silvestre da2022-06-27T13:34:57Z2022-05-182022-05-18T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/140849TID:203028414enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:02:51Zoai:run.unl.pt:10362/140849Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:33:40.054125Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
title Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
spellingShingle Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
Silva, David Fontes Henriques Silvestre da
Natural Language Processing
Transformer Architecture
UMAP
Information Encountering
Information Retrieval
Competitive Intelligence
Sentence Embeddings
Topic Modeling
Unsupervised Learning
title_short Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
title_full Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
title_fullStr Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
title_full_unstemmed Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
title_sort Mapintel: enhancing competitive intelligence acquisition through embeddings and visual analytics
author Silva, David Fontes Henriques Silvestre da
author_facet Silva, David Fontes Henriques Silvestre da
author_role author
dc.contributor.none.fl_str_mv Bação, Fernando José Ferreira Lucas
RUN
dc.contributor.author.fl_str_mv Silva, David Fontes Henriques Silvestre da
dc.subject.por.fl_str_mv Natural Language Processing
Transformer Architecture
UMAP
Information Encountering
Information Retrieval
Competitive Intelligence
Sentence Embeddings
Topic Modeling
Unsupervised Learning
topic Natural Language Processing
Transformer Architecture
UMAP
Information Encountering
Information Retrieval
Competitive Intelligence
Sentence Embeddings
Topic Modeling
Unsupervised Learning
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
publishDate 2022
dc.date.none.fl_str_mv 2022-06-27T13:34:57Z
2022-05-18
2022-05-18T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/140849
TID:203028414
url http://hdl.handle.net/10362/140849
identifier_str_mv TID:203028414
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833596794019250176