How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?

Detalhes bibliográficos
Autor(a) principal: Rosa, Gustavo H. de [UNESP]
Data de Publicação: 2019
Outros Autores: Brega, Jose R. F. [UNESP], Papa, Joao P. [UNESP]
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://dx.doi.org/10.1007/s42452-019-1689-4
http://hdl.handle.net/11449/196603
Resumo: Traditional word embeddings approaches, such as bag-of-words models, tackles the problem of text data representation by linking words in a document to a binary vector, marking their occurrence or not. Additionally, a term frequency-inverse document frequency encoding provides a numerical statistic reflecting how important a particular word is in a document. Nevertheless, the major vulnerability of such models concerns with the loss of contextual meaning, which inhibits them from learning proper pieces of information. A new neural-based embedding approach, known as Word2Vec, tries to mitigate that issue by minimizing the loss of predicting a vector from a particular word considering its surrounding words. Furthermore, as these embedding-based methods produce low-dimensional data, it is impossible to visualize them accurately. With that in mind, dimensionality reduction techniques, such as t-SNE, presents a method to generate bi-dimensional data, allowing its visualization. One common problem of such reductions concerns with the setting of their hyperparameters, such as the perplexity parameter. Therefore, this paper addresses the problem of selecting a suitable perplexity through a meta-heuristic optimization process. Meta-heuristic-driven techniques, such as Artificial Bee Colony, Bat Algorithm, Genetic Programming, and Particle Swarm Optimization, are employed to find proper values for the perplexity parameter. The results revealed that optimizing t-SNE's perplexity is suitable for improving data visualization and thus, an exciting field to be fostered.
id UNSP_a957a2f1d5d6d4eee295db275f2be7ed
oai_identifier_str oai:repositorio.unesp.br:11449/196603
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?Word embeddingsDimensionality reductionMeta-heuristic optimizationTraditional word embeddings approaches, such as bag-of-words models, tackles the problem of text data representation by linking words in a document to a binary vector, marking their occurrence or not. Additionally, a term frequency-inverse document frequency encoding provides a numerical statistic reflecting how important a particular word is in a document. Nevertheless, the major vulnerability of such models concerns with the loss of contextual meaning, which inhibits them from learning proper pieces of information. A new neural-based embedding approach, known as Word2Vec, tries to mitigate that issue by minimizing the loss of predicting a vector from a particular word considering its surrounding words. Furthermore, as these embedding-based methods produce low-dimensional data, it is impossible to visualize them accurately. With that in mind, dimensionality reduction techniques, such as t-SNE, presents a method to generate bi-dimensional data, allowing its visualization. One common problem of such reductions concerns with the setting of their hyperparameters, such as the perplexity parameter. Therefore, this paper addresses the problem of selecting a suitable perplexity through a meta-heuristic optimization process. Meta-heuristic-driven techniques, such as Artificial Bee Colony, Bat Algorithm, Genetic Programming, and Particle Swarm Optimization, are employed to find proper values for the perplexity parameter. The results revealed that optimizing t-SNE's perplexity is suitable for improving data visualization and thus, an exciting field to be fostered.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Sao Paulo State Univ, Dept Comp, Av Eng Luiz Edmundo Carrijo Coube 14-01, Bauru, SP, BrazilSao Paulo State Univ, Dept Comp, Av Eng Luiz Edmundo Carrijo Coube 14-01, Bauru, SP, BrazilFAPESP: 2019/02205-5SpringerUniversidade Estadual Paulista (Unesp)Rosa, Gustavo H. de [UNESP]Brega, Jose R. F. [UNESP]Papa, Joao P. [UNESP]2020-12-10T19:50:10Z2020-12-10T19:50:10Z2019-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article17http://dx.doi.org/10.1007/s42452-019-1689-4Sn Applied Sciences. Cham: Springer International Publishing Ag, v. 1, n. 12, 17 p., 2019.2523-3963http://hdl.handle.net/11449/19660310.1007/s42452-019-1689-4WOS:000515158800026Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengSn Applied Sciencesinfo:eu-repo/semantics/openAccess2024-04-23T16:10:48Zoai:repositorio.unesp.br:11449/196603Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestrepositoriounesp@unesp.bropendoar:29462024-04-23T16:10:48Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
title How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
spellingShingle How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
Rosa, Gustavo H. de [UNESP]
Word embeddings
Dimensionality reduction
Meta-heuristic optimization
title_short How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
title_full How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
title_fullStr How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
title_full_unstemmed How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
title_sort How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
author Rosa, Gustavo H. de [UNESP]
author_facet Rosa, Gustavo H. de [UNESP]
Brega, Jose R. F. [UNESP]
Papa, Joao P. [UNESP]
author_role author
author2 Brega, Jose R. F. [UNESP]
Papa, Joao P. [UNESP]
author2_role author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Rosa, Gustavo H. de [UNESP]
Brega, Jose R. F. [UNESP]
Papa, Joao P. [UNESP]
dc.subject.por.fl_str_mv Word embeddings
Dimensionality reduction
Meta-heuristic optimization
topic Word embeddings
Dimensionality reduction
Meta-heuristic optimization
description Traditional word embeddings approaches, such as bag-of-words models, tackles the problem of text data representation by linking words in a document to a binary vector, marking their occurrence or not. Additionally, a term frequency-inverse document frequency encoding provides a numerical statistic reflecting how important a particular word is in a document. Nevertheless, the major vulnerability of such models concerns with the loss of contextual meaning, which inhibits them from learning proper pieces of information. A new neural-based embedding approach, known as Word2Vec, tries to mitigate that issue by minimizing the loss of predicting a vector from a particular word considering its surrounding words. Furthermore, as these embedding-based methods produce low-dimensional data, it is impossible to visualize them accurately. With that in mind, dimensionality reduction techniques, such as t-SNE, presents a method to generate bi-dimensional data, allowing its visualization. One common problem of such reductions concerns with the setting of their hyperparameters, such as the perplexity parameter. Therefore, this paper addresses the problem of selecting a suitable perplexity through a meta-heuristic optimization process. Meta-heuristic-driven techniques, such as Artificial Bee Colony, Bat Algorithm, Genetic Programming, and Particle Swarm Optimization, are employed to find proper values for the perplexity parameter. The results revealed that optimizing t-SNE's perplexity is suitable for improving data visualization and thus, an exciting field to be fostered.
publishDate 2019
dc.date.none.fl_str_mv 2019-12-01
2020-12-10T19:50:10Z
2020-12-10T19:50:10Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://dx.doi.org/10.1007/s42452-019-1689-4
Sn Applied Sciences. Cham: Springer International Publishing Ag, v. 1, n. 12, 17 p., 2019.
2523-3963
http://hdl.handle.net/11449/196603
10.1007/s42452-019-1689-4
WOS:000515158800026
url http://dx.doi.org/10.1007/s42452-019-1689-4
http://hdl.handle.net/11449/196603
identifier_str_mv Sn Applied Sciences. Cham: Springer International Publishing Ag, v. 1, n. 12, 17 p., 2019.
2523-3963
10.1007/s42452-019-1689-4
WOS:000515158800026
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Sn Applied Sciences
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 17
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv Web of Science
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv repositoriounesp@unesp.br
_version_ 1834483409976557568