How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2019 |
| Outros Autores: | , |
| Tipo de documento: | Artigo |
| Idioma: | eng |
| Título da fonte: | Repositório Institucional da UNESP |
| Texto Completo: | http://dx.doi.org/10.1007/s42452-019-1689-4 http://hdl.handle.net/11449/196603 |
Resumo: | Traditional word embeddings approaches, such as bag-of-words models, tackles the problem of text data representation by linking words in a document to a binary vector, marking their occurrence or not. Additionally, a term frequency-inverse document frequency encoding provides a numerical statistic reflecting how important a particular word is in a document. Nevertheless, the major vulnerability of such models concerns with the loss of contextual meaning, which inhibits them from learning proper pieces of information. A new neural-based embedding approach, known as Word2Vec, tries to mitigate that issue by minimizing the loss of predicting a vector from a particular word considering its surrounding words. Furthermore, as these embedding-based methods produce low-dimensional data, it is impossible to visualize them accurately. With that in mind, dimensionality reduction techniques, such as t-SNE, presents a method to generate bi-dimensional data, allowing its visualization. One common problem of such reductions concerns with the setting of their hyperparameters, such as the perplexity parameter. Therefore, this paper addresses the problem of selecting a suitable perplexity through a meta-heuristic optimization process. Meta-heuristic-driven techniques, such as Artificial Bee Colony, Bat Algorithm, Genetic Programming, and Particle Swarm Optimization, are employed to find proper values for the perplexity parameter. The results revealed that optimizing t-SNE's perplexity is suitable for improving data visualization and thus, an exciting field to be fostered. |
| id |
UNSP_a957a2f1d5d6d4eee295db275f2be7ed |
|---|---|
| oai_identifier_str |
oai:repositorio.unesp.br:11449/196603 |
| network_acronym_str |
UNSP |
| network_name_str |
Repositório Institucional da UNESP |
| repository_id_str |
2946 |
| spelling |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization?Word embeddingsDimensionality reductionMeta-heuristic optimizationTraditional word embeddings approaches, such as bag-of-words models, tackles the problem of text data representation by linking words in a document to a binary vector, marking their occurrence or not. Additionally, a term frequency-inverse document frequency encoding provides a numerical statistic reflecting how important a particular word is in a document. Nevertheless, the major vulnerability of such models concerns with the loss of contextual meaning, which inhibits them from learning proper pieces of information. A new neural-based embedding approach, known as Word2Vec, tries to mitigate that issue by minimizing the loss of predicting a vector from a particular word considering its surrounding words. Furthermore, as these embedding-based methods produce low-dimensional data, it is impossible to visualize them accurately. With that in mind, dimensionality reduction techniques, such as t-SNE, presents a method to generate bi-dimensional data, allowing its visualization. One common problem of such reductions concerns with the setting of their hyperparameters, such as the perplexity parameter. Therefore, this paper addresses the problem of selecting a suitable perplexity through a meta-heuristic optimization process. Meta-heuristic-driven techniques, such as Artificial Bee Colony, Bat Algorithm, Genetic Programming, and Particle Swarm Optimization, are employed to find proper values for the perplexity parameter. The results revealed that optimizing t-SNE's perplexity is suitable for improving data visualization and thus, an exciting field to be fostered.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Sao Paulo State Univ, Dept Comp, Av Eng Luiz Edmundo Carrijo Coube 14-01, Bauru, SP, BrazilSao Paulo State Univ, Dept Comp, Av Eng Luiz Edmundo Carrijo Coube 14-01, Bauru, SP, BrazilFAPESP: 2019/02205-5SpringerUniversidade Estadual Paulista (Unesp)Rosa, Gustavo H. de [UNESP]Brega, Jose R. F. [UNESP]Papa, Joao P. [UNESP]2020-12-10T19:50:10Z2020-12-10T19:50:10Z2019-12-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article17http://dx.doi.org/10.1007/s42452-019-1689-4Sn Applied Sciences. Cham: Springer International Publishing Ag, v. 1, n. 12, 17 p., 2019.2523-3963http://hdl.handle.net/11449/19660310.1007/s42452-019-1689-4WOS:000515158800026Web of Sciencereponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengSn Applied Sciencesinfo:eu-repo/semantics/openAccess2024-04-23T16:10:48Zoai:repositorio.unesp.br:11449/196603Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestrepositoriounesp@unesp.bropendoar:29462024-04-23T16:10:48Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
| dc.title.none.fl_str_mv |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| title |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| spellingShingle |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? Rosa, Gustavo H. de [UNESP] Word embeddings Dimensionality reduction Meta-heuristic optimization |
| title_short |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| title_full |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| title_fullStr |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| title_full_unstemmed |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| title_sort |
How optimizing perplexity can affect the dimensionality reduction on word embeddings visualization? |
| author |
Rosa, Gustavo H. de [UNESP] |
| author_facet |
Rosa, Gustavo H. de [UNESP] Brega, Jose R. F. [UNESP] Papa, Joao P. [UNESP] |
| author_role |
author |
| author2 |
Brega, Jose R. F. [UNESP] Papa, Joao P. [UNESP] |
| author2_role |
author author |
| dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (Unesp) |
| dc.contributor.author.fl_str_mv |
Rosa, Gustavo H. de [UNESP] Brega, Jose R. F. [UNESP] Papa, Joao P. [UNESP] |
| dc.subject.por.fl_str_mv |
Word embeddings Dimensionality reduction Meta-heuristic optimization |
| topic |
Word embeddings Dimensionality reduction Meta-heuristic optimization |
| description |
Traditional word embeddings approaches, such as bag-of-words models, tackles the problem of text data representation by linking words in a document to a binary vector, marking their occurrence or not. Additionally, a term frequency-inverse document frequency encoding provides a numerical statistic reflecting how important a particular word is in a document. Nevertheless, the major vulnerability of such models concerns with the loss of contextual meaning, which inhibits them from learning proper pieces of information. A new neural-based embedding approach, known as Word2Vec, tries to mitigate that issue by minimizing the loss of predicting a vector from a particular word considering its surrounding words. Furthermore, as these embedding-based methods produce low-dimensional data, it is impossible to visualize them accurately. With that in mind, dimensionality reduction techniques, such as t-SNE, presents a method to generate bi-dimensional data, allowing its visualization. One common problem of such reductions concerns with the setting of their hyperparameters, such as the perplexity parameter. Therefore, this paper addresses the problem of selecting a suitable perplexity through a meta-heuristic optimization process. Meta-heuristic-driven techniques, such as Artificial Bee Colony, Bat Algorithm, Genetic Programming, and Particle Swarm Optimization, are employed to find proper values for the perplexity parameter. The results revealed that optimizing t-SNE's perplexity is suitable for improving data visualization and thus, an exciting field to be fostered. |
| publishDate |
2019 |
| dc.date.none.fl_str_mv |
2019-12-01 2020-12-10T19:50:10Z 2020-12-10T19:50:10Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1007/s42452-019-1689-4 Sn Applied Sciences. Cham: Springer International Publishing Ag, v. 1, n. 12, 17 p., 2019. 2523-3963 http://hdl.handle.net/11449/196603 10.1007/s42452-019-1689-4 WOS:000515158800026 |
| url |
http://dx.doi.org/10.1007/s42452-019-1689-4 http://hdl.handle.net/11449/196603 |
| identifier_str_mv |
Sn Applied Sciences. Cham: Springer International Publishing Ag, v. 1, n. 12, 17 p., 2019. 2523-3963 10.1007/s42452-019-1689-4 WOS:000515158800026 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
Sn Applied Sciences |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
17 |
| dc.publisher.none.fl_str_mv |
Springer |
| publisher.none.fl_str_mv |
Springer |
| dc.source.none.fl_str_mv |
Web of Science reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
| instname_str |
Universidade Estadual Paulista (UNESP) |
| instacron_str |
UNESP |
| institution |
UNESP |
| reponame_str |
Repositório Institucional da UNESP |
| collection |
Repositório Institucional da UNESP |
| repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
| repository.mail.fl_str_mv |
repositoriounesp@unesp.br |
| _version_ |
1834483409976557568 |