WORD2VEC: A SAUSSUREAN ALGORITHM
Main Author: | |
---|---|
Publication Date: | 2025 |
Format: | preprint |
Language: | por |
Source: | SciELO Preprints |
Download full: | https://preprints.scielo.org/index.php/scielo/preprint/view/11678 |
Summary: | This article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago. |
id |
SCI-1_a457ec025954bfbbc34b71e2eab52a1e |
---|---|
oai_identifier_str |
oai:ops.preprints.scielo.org:preprint/11678 |
network_acronym_str |
SCI-1 |
network_name_str |
SciELO Preprints |
repository_id_str |
|
spelling |
WORD2VEC: A SAUSSUREAN ALGORITHMWord2Vec: Um algoritmo saussurianoSaussureProcessamento de Linguagem NaturalWord2VecTeoria do ValorSausssureNatural Language ProcessingWord2VecTheory of ValueThis article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago. Este artículo propone una lectura del funcionamiento de Word2Vec, un algoritmo para la generación de embeddings de palabras, a la luz de la Teoría del Valor (TdV) de Ferdinand de Saussure. Word2Vec, en los últimos años, ha sido bastante útil para diversas tareas de PLN, tales como la clasificación de textos, el análisis de sentimientos y el cálculo de probabilidades de ocurrencia, debido al manejo de vectores de alta dimensión. Defiendo, por lo tanto, que, a través de este modelo de lenguaje, es posible observar que algunas nociones teóricas de la lingüística saussuriana —a saber, el sistema, el signo y el valor— continúan siendo productivas para reflexionar sobre aspectos teóricos y epistemológicos de la determinación de significados en las lenguas naturales, así como sobre la forma en que estos sentidos parecen ser emulados por técnicas modernas de PLN, como el Word2Vec. Partimos de una crítica a las limitaciones del TF-IDF, pasando por la influencia de la Semántica Distribucional y de la Hipótesis Distribucional en los modelos vectoriales de lenguaje modernos, para finalmente proponer que Word2Vec presenta indicios de poder operacionalizar, en niveles de semántica computacional, aquello que Saussure ya había formulado conceptualmente a inicios del siglo XX: la idea de que el significado de una palabra no es fijo ni individual, sino relacional y determinado por los valores de semejanza y diferencia que la rodean. En este sentido, las fuentes saussurianas movilizadas en esta investigación para la delimitación de los conceptos abordados fueron: el Curso de Lingüística General; el conjunto de manuscritos Notes pour le 3e Cours; y el cuaderno de Émile Constantin, oyente del Tercer Curso de Lingüística General impartido por Saussure en Ginebra, entre 1910 y 1911. Nuestro objetivo, así, es proponer que las nociones saussurianas de similia y dissimilía pueden percibirse en los bastidores teóricos de Word2Vec, promoviendo una aproximación entre el saussurianismo y el PLN contemporáneo. La hipótesis que guía este trabajo, por lo tanto, es que Word2Vec puede ser leído como un algoritmo saussuriano, por aplicar computacionalmente la dinámica de los valores lingüísticos para emular la forma en que los significados son determinados mediante la relación entre las palabras, tal como lo anticipara el maestro ginebrino ya en el siglo pasado. Este artigo propõe uma leitura do funcionamento do Word2Vec, um algoritmo para geração de embeddings de palavras, à luz da Teoria do Valor (TdV) de Ferdinand de Saussure. O Word2Vec, nos últimos anos, tem sido bastante útil para diversas tarefas de PLN, tais como classificação de textos, análise de sentimentos e cálculos de probabilidade de ocorrências, devido ao manejo de vetores de alta dimensão. Defendo, portanto, que, por meio deste modelo de linguagem, é possível notarmos que algumas noções teóricas da linguística sausssuriana, a saber, o sistema, o signo e o valor, continuam sendo produtivos para refletir sobre aspectos teóricos e epistemológicos da determinação de significados nas línguas naturais; bem como de que forma esses sentidos parecem ser emulados por técnicas modernas de PLN, a exemplo do Word2Vec. Partimos de uma crítica às limitações do TF-IDF, passando pela influência da Semântica Distribucional e da Hipótese Distribucional em modelos vetoriais de linguagem modernos, para, enfim, propor que o Word2Vec apresenta indícios de poder operacionalizar, em níveis de semântica computacional, aquilo que Saussure já formulara conceitualmente no início do século XX, a saber: a ideia de que o significado de uma palavra não é fixo nem individual, mas relacional e determinado pelos valores semelhantes e dessemelhantes que a cercam. Nesse sentido, as fontes saussurianas mobilizadas, nesta pesquisa, para a delimitação dos conceitos abordados foram: o Curso de Linguística Geral; o conjunto de manuscritos Notes pour le 3e Cours; e o caderno de Émile Constantin, ouvinte do Terceiro Curso de Linguística Geral ministrado por Saussure em Genebra, entre 1910-1911. Nosso objetivo, assim, é propor que as noções saussurianas de similia e dissimilia podem ser percebidas nos bastidores teóricos do Word2Vec, promovendo uma aproximação entre o saussurianismo e o PLN contemporâneo. A hipótese que guia este trabalho, portanto, é a de que o Word2Vec pode ser lido como um algoritmo saussuriano, por aplicar computacionalmente a dinâmica dos valores linguísticos para emular a forma com que significados são determinados por meio da relação entre as palavras, conforme antecipara o mestre genebrino ainda no século passado.SciELO PreprintsSciELO PreprintsSciELO Preprints2025-04-17info:eu-repo/semantics/preprintinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://preprints.scielo.org/index.php/scielo/preprint/view/1167810.1590/SciELOPreprints.11678porhttps://preprints.scielo.org/index.php/scielo/preprint/view/11678/21267Copyright (c) 2025 Leonardo Giamarustihttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessGiamarusti, Leonardoreponame:SciELO Preprintsinstname:Scientific Electronic Library Online (SCIELO)instacron:SCI2025-04-07T13:37:27Zoai:ops.preprints.scielo.org:preprint/11678Servidor de preprintshttps://preprints.scielo.org/index.php/scieloONGhttps://preprints.scielo.org/index.php/scielo/oaiscielo.submission@scielo.orgopendoar:2025-04-07T13:37:27SciELO Preprints - Scientific Electronic Library Online (SCIELO)false |
dc.title.none.fl_str_mv |
WORD2VEC: A SAUSSUREAN ALGORITHM Word2Vec: Um algoritmo saussuriano |
title |
WORD2VEC: A SAUSSUREAN ALGORITHM |
spellingShingle |
WORD2VEC: A SAUSSUREAN ALGORITHM Giamarusti, Leonardo Saussure Processamento de Linguagem Natural Word2Vec Teoria do Valor Sausssure Natural Language Processing Word2Vec Theory of Value |
title_short |
WORD2VEC: A SAUSSUREAN ALGORITHM |
title_full |
WORD2VEC: A SAUSSUREAN ALGORITHM |
title_fullStr |
WORD2VEC: A SAUSSUREAN ALGORITHM |
title_full_unstemmed |
WORD2VEC: A SAUSSUREAN ALGORITHM |
title_sort |
WORD2VEC: A SAUSSUREAN ALGORITHM |
author |
Giamarusti, Leonardo |
author_facet |
Giamarusti, Leonardo |
author_role |
author |
dc.contributor.author.fl_str_mv |
Giamarusti, Leonardo |
dc.subject.por.fl_str_mv |
Saussure Processamento de Linguagem Natural Word2Vec Teoria do Valor Sausssure Natural Language Processing Word2Vec Theory of Value |
topic |
Saussure Processamento de Linguagem Natural Word2Vec Teoria do Valor Sausssure Natural Language Processing Word2Vec Theory of Value |
description |
This article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago. |
publishDate |
2025 |
dc.date.none.fl_str_mv |
2025-04-17 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/preprint info:eu-repo/semantics/publishedVersion |
format |
preprint |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://preprints.scielo.org/index.php/scielo/preprint/view/11678 10.1590/SciELOPreprints.11678 |
url |
https://preprints.scielo.org/index.php/scielo/preprint/view/11678 |
identifier_str_mv |
10.1590/SciELOPreprints.11678 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://preprints.scielo.org/index.php/scielo/preprint/view/11678/21267 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2025 Leonardo Giamarusti https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2025 Leonardo Giamarusti https://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
SciELO Preprints SciELO Preprints SciELO Preprints |
publisher.none.fl_str_mv |
SciELO Preprints SciELO Preprints SciELO Preprints |
dc.source.none.fl_str_mv |
reponame:SciELO Preprints instname:Scientific Electronic Library Online (SCIELO) instacron:SCI |
instname_str |
Scientific Electronic Library Online (SCIELO) |
instacron_str |
SCI |
institution |
SCI |
reponame_str |
SciELO Preprints |
collection |
SciELO Preprints |
repository.name.fl_str_mv |
SciELO Preprints - Scientific Electronic Library Online (SCIELO) |
repository.mail.fl_str_mv |
scielo.submission@scielo.org |
_version_ |
1831964362435198976 |