WORD2VEC: A SAUSSUREAN ALGORITHM

Bibliographic Details
Main Author: Giamarusti, Leonardo
Publication Date: 2025
Format: preprint
Language: por
Source: SciELO Preprints
Download full: https://preprints.scielo.org/index.php/scielo/preprint/view/11678
Summary: This article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago.
id SCI-1_a457ec025954bfbbc34b71e2eab52a1e
oai_identifier_str oai:ops.preprints.scielo.org:preprint/11678
network_acronym_str SCI-1
network_name_str SciELO Preprints
repository_id_str
spelling WORD2VEC: A SAUSSUREAN ALGORITHMWord2Vec: Um algoritmo saussurianoSaussureProcessamento de Linguagem NaturalWord2VecTeoria do ValorSausssureNatural Language ProcessingWord2VecTheory of ValueThis article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago. Este artículo propone una lectura del funcionamiento de Word2Vec, un algoritmo para la generación de embeddings de palabras, a la luz de la Teoría del Valor (TdV) de Ferdinand de Saussure. Word2Vec, en los últimos años, ha sido bastante útil para diversas tareas de PLN, tales como la clasificación de textos, el análisis de sentimientos y el cálculo de probabilidades de ocurrencia, debido al manejo de vectores de alta dimensión. Defiendo, por lo tanto, que, a través de este modelo de lenguaje, es posible observar que algunas nociones teóricas de la lingüística saussuriana —a saber, el sistema, el signo y el valor— continúan siendo productivas para reflexionar sobre aspectos teóricos y epistemológicos de la determinación de significados en las lenguas naturales, así como sobre la forma en que estos sentidos parecen ser emulados por técnicas modernas de PLN, como el Word2Vec. Partimos de una crítica a las limitaciones del TF-IDF, pasando por la influencia de la Semántica Distribucional y de la Hipótesis Distribucional en los modelos vectoriales de lenguaje modernos, para finalmente proponer que Word2Vec presenta indicios de poder operacionalizar, en niveles de semántica computacional, aquello que Saussure ya había formulado conceptualmente a inicios del siglo XX: la idea de que el significado de una palabra no es fijo ni individual, sino relacional y determinado por los valores de semejanza y diferencia que la rodean. En este sentido, las fuentes saussurianas movilizadas en esta investigación para la delimitación de los conceptos abordados fueron: el Curso de Lingüística General; el conjunto de manuscritos Notes pour le 3e Cours; y el cuaderno de Émile Constantin, oyente del Tercer Curso de Lingüística General impartido por Saussure en Ginebra, entre 1910 y 1911. Nuestro objetivo, así, es proponer que las nociones saussurianas de similia y dissimilía pueden percibirse en los bastidores teóricos de Word2Vec, promoviendo una aproximación entre el saussurianismo y el PLN contemporáneo. La hipótesis que guía este trabajo, por lo tanto, es que Word2Vec puede ser leído como un algoritmo saussuriano, por aplicar computacionalmente la dinámica de los valores lingüísticos para emular la forma en que los significados son determinados mediante la relación entre las palabras, tal como lo anticipara el maestro ginebrino ya en el siglo pasado. Este artigo propõe uma leitura do funcionamento do Word2Vec, um algoritmo para geração de embeddings de palavras, à luz da Teoria do Valor (TdV) de Ferdinand de Saussure. O Word2Vec, nos últimos anos, tem sido bastante útil para diversas tarefas de PLN, tais como classificação de textos, análise de sentimentos e cálculos de probabilidade de ocorrências, devido ao manejo de vetores de alta dimensão. Defendo, portanto, que, por meio deste modelo de linguagem, é possível notarmos que algumas noções teóricas da linguística sausssuriana, a saber, o sistema, o signo e o valor, continuam sendo produtivos para refletir sobre aspectos teóricos e epistemológicos da determinação de significados nas línguas naturais; bem como de que forma esses sentidos parecem ser emulados por técnicas modernas de PLN, a exemplo do Word2Vec. Partimos de uma crítica às limitações do TF-IDF, passando pela influência da Semântica Distribucional e da Hipótese Distribucional em modelos vetoriais de linguagem modernos, para, enfim, propor que o Word2Vec apresenta indícios de poder operacionalizar, em níveis de semântica computacional, aquilo que Saussure já formulara conceitualmente no início do século XX, a saber: a ideia de que o significado de uma palavra não é fixo nem individual, mas relacional e determinado pelos valores semelhantes e dessemelhantes que a cercam. Nesse sentido, as fontes saussurianas mobilizadas, nesta pesquisa, para a delimitação dos conceitos abordados foram: o Curso de Linguística Geral; o conjunto de manuscritos Notes pour le 3e Cours; e o caderno de Émile Constantin, ouvinte do Terceiro Curso de Linguística Geral ministrado por Saussure em Genebra, entre 1910-1911. Nosso objetivo, assim, é propor que as noções saussurianas de similia e dissimilia podem ser percebidas nos bastidores teóricos do Word2Vec, promovendo uma aproximação entre o saussurianismo e o PLN contemporâneo. A hipótese que guia este trabalho, portanto, é a de que o Word2Vec pode ser lido como um algoritmo saussuriano, por aplicar computacionalmente a dinâmica dos valores linguísticos para emular a forma com que significados são determinados por meio da relação entre as palavras, conforme antecipara o mestre genebrino ainda no século passado.SciELO PreprintsSciELO PreprintsSciELO Preprints2025-04-17info:eu-repo/semantics/preprintinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://preprints.scielo.org/index.php/scielo/preprint/view/1167810.1590/SciELOPreprints.11678porhttps://preprints.scielo.org/index.php/scielo/preprint/view/11678/21267Copyright (c) 2025 Leonardo Giamarustihttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessGiamarusti, Leonardoreponame:SciELO Preprintsinstname:Scientific Electronic Library Online (SCIELO)instacron:SCI2025-04-07T13:37:27Zoai:ops.preprints.scielo.org:preprint/11678Servidor de preprintshttps://preprints.scielo.org/index.php/scieloONGhttps://preprints.scielo.org/index.php/scielo/oaiscielo.submission@scielo.orgopendoar:2025-04-07T13:37:27SciELO Preprints - Scientific Electronic Library Online (SCIELO)false
dc.title.none.fl_str_mv WORD2VEC: A SAUSSUREAN ALGORITHM
Word2Vec: Um algoritmo saussuriano
title WORD2VEC: A SAUSSUREAN ALGORITHM
spellingShingle WORD2VEC: A SAUSSUREAN ALGORITHM
Giamarusti, Leonardo
Saussure
Processamento de Linguagem Natural
Word2Vec
Teoria do Valor
Sausssure
Natural Language Processing
Word2Vec
Theory of Value
title_short WORD2VEC: A SAUSSUREAN ALGORITHM
title_full WORD2VEC: A SAUSSUREAN ALGORITHM
title_fullStr WORD2VEC: A SAUSSUREAN ALGORITHM
title_full_unstemmed WORD2VEC: A SAUSSUREAN ALGORITHM
title_sort WORD2VEC: A SAUSSUREAN ALGORITHM
author Giamarusti, Leonardo
author_facet Giamarusti, Leonardo
author_role author
dc.contributor.author.fl_str_mv Giamarusti, Leonardo
dc.subject.por.fl_str_mv Saussure
Processamento de Linguagem Natural
Word2Vec
Teoria do Valor
Sausssure
Natural Language Processing
Word2Vec
Theory of Value
topic Saussure
Processamento de Linguagem Natural
Word2Vec
Teoria do Valor
Sausssure
Natural Language Processing
Word2Vec
Theory of Value
description This article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago.
publishDate 2025
dc.date.none.fl_str_mv 2025-04-17
dc.type.driver.fl_str_mv info:eu-repo/semantics/preprint
info:eu-repo/semantics/publishedVersion
format preprint
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://preprints.scielo.org/index.php/scielo/preprint/view/11678
10.1590/SciELOPreprints.11678
url https://preprints.scielo.org/index.php/scielo/preprint/view/11678
identifier_str_mv 10.1590/SciELOPreprints.11678
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://preprints.scielo.org/index.php/scielo/preprint/view/11678/21267
dc.rights.driver.fl_str_mv Copyright (c) 2025 Leonardo Giamarusti
https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2025 Leonardo Giamarusti
https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv SciELO Preprints
SciELO Preprints
SciELO Preprints
publisher.none.fl_str_mv SciELO Preprints
SciELO Preprints
SciELO Preprints
dc.source.none.fl_str_mv reponame:SciELO Preprints
instname:Scientific Electronic Library Online (SCIELO)
instacron:SCI
instname_str Scientific Electronic Library Online (SCIELO)
instacron_str SCI
institution SCI
reponame_str SciELO Preprints
collection SciELO Preprints
repository.name.fl_str_mv SciELO Preprints - Scientific Electronic Library Online (SCIELO)
repository.mail.fl_str_mv scielo.submission@scielo.org
_version_ 1831964362435198976