Export Ready — 

Geração de vetores de sentido para o português

Bibliographic Details
Main Author: Silva, Jéssica Rodrigues da
Publication Date: 2019
Format: Master thesis
Language: por
Source: Repositório Institucional da UFSCAR
Download full: https://repositorio.ufscar.br/handle/20.500.14289/11792
Summary: Numerical vector representations are able to represent from words to meanings, in a low-dimensional continuous space. These representations are based on distributional modeling, where the context in which the word occurs is taken into account for vector generation. The word representations, known as word embeddings or word vectors (Word2vec, FastText, Wang2vec and Glove), which have been widely used until now, have an important limitation: they produce a single vector representation for each word, ignoring the fact that ambiguous words can represent different meanings (different contexts). This mixture of meanings can be a problem for many applications. For example, in a language comprehension task, using the vector of an ambiguous word as "bank", all possible meanings --such as financial institution, blood bank, or furniture item --will be mixed into a single numerical vector, causing an erroneous semantic interpretation of the sentence in which it occurs. Over the last few years, representations of meanings, known as sense embeddings or sense vectors, have proven to be able to model syntactic and semantic knowledge and have been used in NLP applications. By being able to transform the various meanings of an ambiguous word into numerical vectors, sense vectors can be applied to Word Sense Disambiguation (WSD). Thus, this work generated and evaluated sense vectors for Portuguese (PT-BR and PT-EU), and showed that they overcome traditional vectors in intrinsic and extrinsic NLP tasks, since they are capable of dealing with lexical ambiguity. To the best of our knowledge, this is the first work to address the geneation and evaluation of sense vectors for Portuguese.
id SCAR_5c83ea9828c53b27d724b8a907ec2a30
oai_identifier_str oai:repositorio.ufscar.br:20.500.14289/11792
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str 4322
spelling Silva, Jéssica Rodrigues daCaseli, Helena de Medeiroshttp://lattes.cnpq.br/6608582057810385http://lattes.cnpq.br/8077169220013752a98b2fa2-5521-403c-bfc2-f5fae6db5c2c2019-09-06T19:06:09Z2019-09-06T19:06:09Z2019-07-03SILVA, Jéssica Rodrigues da. Geração de vetores de sentido para o português. 2019. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/11792.https://repositorio.ufscar.br/handle/20.500.14289/11792Numerical vector representations are able to represent from words to meanings, in a low-dimensional continuous space. These representations are based on distributional modeling, where the context in which the word occurs is taken into account for vector generation. The word representations, known as word embeddings or word vectors (Word2vec, FastText, Wang2vec and Glove), which have been widely used until now, have an important limitation: they produce a single vector representation for each word, ignoring the fact that ambiguous words can represent different meanings (different contexts). This mixture of meanings can be a problem for many applications. For example, in a language comprehension task, using the vector of an ambiguous word as "bank", all possible meanings --such as financial institution, blood bank, or furniture item --will be mixed into a single numerical vector, causing an erroneous semantic interpretation of the sentence in which it occurs. Over the last few years, representations of meanings, known as sense embeddings or sense vectors, have proven to be able to model syntactic and semantic knowledge and have been used in NLP applications. By being able to transform the various meanings of an ambiguous word into numerical vectors, sense vectors can be applied to Word Sense Disambiguation (WSD). Thus, this work generated and evaluated sense vectors for Portuguese (PT-BR and PT-EU), and showed that they overcome traditional vectors in intrinsic and extrinsic NLP tasks, since they are capable of dealing with lexical ambiguity. To the best of our knowledge, this is the first work to address the geneation and evaluation of sense vectors for Portuguese.Representações vetoriais numéricas são capazes de representar desde palavras até significados, em espaços vetoriais contínuos de baixa dimensão. Essas representações utilizam a modelagem distribucional, onde o contexto em que a palavra ocorre é levado em consideração para a geração do vetor. As representações de palavras, mais conhecidas como word embeddings ou word vectors (Word2vec, FastText, Wang2vec e Glove), muito utilizadas até então, apresentam uma importante limitação: produzem uma única representação vetorial para cada palavra, ignorando o fato de que palavras ambíguas podem assumir significados diferentes (contextos diferentes). Essa combinação de significados pode ser um problema para várias aplicações. Por exemplo, em uma tarefa de compreensão de linguagem, usando o vetor de uma palavra ambígua como "banco", todos os possíveis significados -- como instituição financeira, banco de sangue ou um item de mobília -- serão misturados em um único vetor numérico, causando uma interpretação semântica errada da sentença na qual ocorre. Ao longo dos últimos anos, as representações de significados (sentidos), conhecidas como sense embeddings ou sense vectors mostraram ser capazes de modelar conhecimento sintático e semântico e passaram a ser utilizadas em aplicações de PLN. Por ser capaz de transformar os vários sentidos de uma palavra ambígua em vetores numéricos, notou-se o poder desse recurso em fazer Desambiguação Lexical de Sentidos (DLS). Esta pesquisa gerou e avaliou vetores de sentido para o português (PT-BR e PT-EU), e mostrou que eles superam os vetores tradicionais em tarefas intrínsecas e extrínsecas de PLN, já que são capazes de lidar com a ambiguidade lexical. Até onde sabemos, este é o primeiro trabalho a investigar a geração e a avaliação de vetores de sentido para o português.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)CAPES: Código de Financiamento 001FAPESP: 2016/13002-0porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarVetores de sentidoDesambiguação lexical de sentidosVetores de palavraModelagem distribucionalSense embeddingsSense vectorsWord sense disambiguationWord embeddingsWord vectorsDistributional modelingCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOGeração de vetores de sentido para o portuguêsGenerating sense embeddings for portugueseinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline600600e36d4e63-960d-4f5c-9c93-f8b7f5f93d65info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALRODRIGUES_jessica_2019.pdfRODRIGUES_jessica_2019.pdfapplication/pdf1894553https://repositorio.ufscar.br/bitstreams/507e1f18-99f8-4bd2-adfe-1b7ddfbeb7e7/download5f313c106a7db911fa42ca0d8087c6daMD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstreams/74da2300-3855-4a14-a8dc-21fc3545855f/downloadae0398b6f8b235e40ad82cba6c50031dMD53falseAnonymousREADTEXTRODRIGUES_jessica_2019.pdf.txtRODRIGUES_jessica_2019.pdf.txtExtracted texttext/plain215837https://repositorio.ufscar.br/bitstreams/6f59ee4f-2d15-4093-9983-43a74f2451fd/download91a401d3b143c6990dedb87364f07985MD56falseAnonymousREADTHUMBNAILRODRIGUES_jessica_2019.pdf.jpgRODRIGUES_jessica_2019.pdf.jpgIM Thumbnailimage/jpeg8297https://repositorio.ufscar.br/bitstreams/6298d108-ae74-4e0a-a031-a2ffed1faf01/download19f180e8f06e57a0ba0ddeaf6815187dMD57falseAnonymousREAD20.500.14289/117922025-02-05 18:16:37.271Acesso abertoopen.accessoai:repositorio.ufscar.br:20.500.14289/11792https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T21:16:37Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==
dc.title.por.fl_str_mv Geração de vetores de sentido para o português
dc.title.alternative.eng.fl_str_mv Generating sense embeddings for portuguese
title Geração de vetores de sentido para o português
spellingShingle Geração de vetores de sentido para o português
Silva, Jéssica Rodrigues da
Vetores de sentido
Desambiguação lexical de sentidos
Vetores de palavra
Modelagem distribucional
Sense embeddings
Sense vectors
Word sense disambiguation
Word embeddings
Word vectors
Distributional modeling
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Geração de vetores de sentido para o português
title_full Geração de vetores de sentido para o português
title_fullStr Geração de vetores de sentido para o português
title_full_unstemmed Geração de vetores de sentido para o português
title_sort Geração de vetores de sentido para o português
author Silva, Jéssica Rodrigues da
author_facet Silva, Jéssica Rodrigues da
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/8077169220013752
dc.contributor.author.fl_str_mv Silva, Jéssica Rodrigues da
dc.contributor.advisor1.fl_str_mv Caseli, Helena de Medeiros
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/6608582057810385
dc.contributor.authorID.fl_str_mv a98b2fa2-5521-403c-bfc2-f5fae6db5c2c
contributor_str_mv Caseli, Helena de Medeiros
dc.subject.por.fl_str_mv Vetores de sentido
Desambiguação lexical de sentidos
Vetores de palavra
Modelagem distribucional
topic Vetores de sentido
Desambiguação lexical de sentidos
Vetores de palavra
Modelagem distribucional
Sense embeddings
Sense vectors
Word sense disambiguation
Word embeddings
Word vectors
Distributional modeling
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Sense embeddings
Sense vectors
Word sense disambiguation
Word embeddings
Word vectors
Distributional modeling
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Numerical vector representations are able to represent from words to meanings, in a low-dimensional continuous space. These representations are based on distributional modeling, where the context in which the word occurs is taken into account for vector generation. The word representations, known as word embeddings or word vectors (Word2vec, FastText, Wang2vec and Glove), which have been widely used until now, have an important limitation: they produce a single vector representation for each word, ignoring the fact that ambiguous words can represent different meanings (different contexts). This mixture of meanings can be a problem for many applications. For example, in a language comprehension task, using the vector of an ambiguous word as "bank", all possible meanings --such as financial institution, blood bank, or furniture item --will be mixed into a single numerical vector, causing an erroneous semantic interpretation of the sentence in which it occurs. Over the last few years, representations of meanings, known as sense embeddings or sense vectors, have proven to be able to model syntactic and semantic knowledge and have been used in NLP applications. By being able to transform the various meanings of an ambiguous word into numerical vectors, sense vectors can be applied to Word Sense Disambiguation (WSD). Thus, this work generated and evaluated sense vectors for Portuguese (PT-BR and PT-EU), and showed that they overcome traditional vectors in intrinsic and extrinsic NLP tasks, since they are capable of dealing with lexical ambiguity. To the best of our knowledge, this is the first work to address the geneation and evaluation of sense vectors for Portuguese.
publishDate 2019
dc.date.accessioned.fl_str_mv 2019-09-06T19:06:09Z
dc.date.available.fl_str_mv 2019-09-06T19:06:09Z
dc.date.issued.fl_str_mv 2019-07-03
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv SILVA, Jéssica Rodrigues da. Geração de vetores de sentido para o português. 2019. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/11792.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/20.500.14289/11792
identifier_str_mv SILVA, Jéssica Rodrigues da. Geração de vetores de sentido para o português. 2019. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/11792.
url https://repositorio.ufscar.br/handle/20.500.14289/11792
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv e36d4e63-960d-4f5c-9c93-f8b7f5f93d65
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstreams/507e1f18-99f8-4bd2-adfe-1b7ddfbeb7e7/download
https://repositorio.ufscar.br/bitstreams/74da2300-3855-4a14-a8dc-21fc3545855f/download
https://repositorio.ufscar.br/bitstreams/6f59ee4f-2d15-4093-9983-43a74f2451fd/download
https://repositorio.ufscar.br/bitstreams/6298d108-ae74-4e0a-a031-a2ffed1faf01/download
bitstream.checksum.fl_str_mv 5f313c106a7db911fa42ca0d8087c6da
ae0398b6f8b235e40ad82cba6c50031d
91a401d3b143c6990dedb87364f07985
19f180e8f06e57a0ba0ddeaf6815187d
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv repositorio.sibi@ufscar.br
_version_ 1834468942222983168