When External Knowledge Does Not Aggregate in Named Entity Recognition
Main Author: | |
---|---|
Publication Date: | 2021 |
Other Authors: | |
Format: | Conference object |
Language: | eng |
Source: | Repositório Institucional da UNESP |
Download full: | http://dx.doi.org/10.1007/978-3-030-91699-2_42 http://hdl.handle.net/11449/233938 |
Summary: | In the different areas of knowledge, textual data are important sources of information. This way, Information Extraction methods have been developed to identify and structure information present in textual documents. In particular there is the Named Entity Recognition (NER) task, which consists of using methods to identify Named Entities, such as Person, Place, among others, in texts, using techniques from Natural Language Processing and Machine Learning. Recent works explored the use of external sources of knowledge to boost the Machine Learning models with sets of domain specific relevant information for the NER task. This work aims to evaluate the aggregation of external knowledge, in the form of Gazetter and Knowledge Graphs, for NER task. Our approach is composed of two steps: i) generation of embeddings, ii) definition and training of the Machine Learning methods. The experiments were conducted on four English datasets, and their results show that the applied strategies for external knowledge integration did not bring great gains to the models, as expressed by F1-Score metric. In the performed experiments, there was an F1-score increase in 17 of the 32 cases where external knowledge was used, but in most cases the gains were lesser than 0.5% in F1-score. In some scenarios the aggregated external knowledge does not capture relevant content, thus not being necessarily beneficial to the methodology. |
id |
UNSP_138468a84eee9f53c3f7454a65ebd78d |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/233938 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
When External Knowledge Does Not Aggregate in Named Entity RecognitionInformation extractionKnowledge embeddingsNamed entity recognitionIn the different areas of knowledge, textual data are important sources of information. This way, Information Extraction methods have been developed to identify and structure information present in textual documents. In particular there is the Named Entity Recognition (NER) task, which consists of using methods to identify Named Entities, such as Person, Place, among others, in texts, using techniques from Natural Language Processing and Machine Learning. Recent works explored the use of external sources of knowledge to boost the Machine Learning models with sets of domain specific relevant information for the NER task. This work aims to evaluate the aggregation of external knowledge, in the form of Gazetter and Knowledge Graphs, for NER task. Our approach is composed of two steps: i) generation of embeddings, ii) definition and training of the Machine Learning methods. The experiments were conducted on four English datasets, and their results show that the applied strategies for external knowledge integration did not bring great gains to the models, as expressed by F1-Score metric. In the performed experiments, there was an F1-score increase in 17 of the 32 cases where external knowledge was used, but in most cases the gains were lesser than 0.5% in F1-score. In some scenarios the aggregated external knowledge does not capture relevant content, thus not being necessarily beneficial to the methodology.Institute of Geosciences and Exact Sciences UNESP - São Paulo State University, SPInstitute of Geosciences and Exact Sciences UNESP - São Paulo State University, SPUniversidade Estadual Paulista (UNESP)Privatto, Pedro Ivo Monteiro [UNESP]Guilherme, Ivan Rizzo [UNESP]2022-05-01T11:54:06Z2022-05-01T11:54:06Z2021-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject616-627http://dx.doi.org/10.1007/978-3-030-91699-2_42Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 13074 LNAI, p. 616-627.1611-33490302-9743http://hdl.handle.net/11449/23393810.1007/978-3-030-91699-2_422-s2.0-85121798857Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)info:eu-repo/semantics/openAccess2024-11-27T14:10:17Zoai:repositorio.unesp.br:11449/233938Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestrepositoriounesp@unesp.bropendoar:29462024-11-27T14:10:17Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
title |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
spellingShingle |
When External Knowledge Does Not Aggregate in Named Entity Recognition Privatto, Pedro Ivo Monteiro [UNESP] Information extraction Knowledge embeddings Named entity recognition |
title_short |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
title_full |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
title_fullStr |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
title_full_unstemmed |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
title_sort |
When External Knowledge Does Not Aggregate in Named Entity Recognition |
author |
Privatto, Pedro Ivo Monteiro [UNESP] |
author_facet |
Privatto, Pedro Ivo Monteiro [UNESP] Guilherme, Ivan Rizzo [UNESP] |
author_role |
author |
author2 |
Guilherme, Ivan Rizzo [UNESP] |
author2_role |
author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) |
dc.contributor.author.fl_str_mv |
Privatto, Pedro Ivo Monteiro [UNESP] Guilherme, Ivan Rizzo [UNESP] |
dc.subject.por.fl_str_mv |
Information extraction Knowledge embeddings Named entity recognition |
topic |
Information extraction Knowledge embeddings Named entity recognition |
description |
In the different areas of knowledge, textual data are important sources of information. This way, Information Extraction methods have been developed to identify and structure information present in textual documents. In particular there is the Named Entity Recognition (NER) task, which consists of using methods to identify Named Entities, such as Person, Place, among others, in texts, using techniques from Natural Language Processing and Machine Learning. Recent works explored the use of external sources of knowledge to boost the Machine Learning models with sets of domain specific relevant information for the NER task. This work aims to evaluate the aggregation of external knowledge, in the form of Gazetter and Knowledge Graphs, for NER task. Our approach is composed of two steps: i) generation of embeddings, ii) definition and training of the Machine Learning methods. The experiments were conducted on four English datasets, and their results show that the applied strategies for external knowledge integration did not bring great gains to the models, as expressed by F1-Score metric. In the performed experiments, there was an F1-score increase in 17 of the 32 cases where external knowledge was used, but in most cases the gains were lesser than 0.5% in F1-score. In some scenarios the aggregated external knowledge does not capture relevant content, thus not being necessarily beneficial to the methodology. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-01-01 2022-05-01T11:54:06Z 2022-05-01T11:54:06Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://dx.doi.org/10.1007/978-3-030-91699-2_42 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 13074 LNAI, p. 616-627. 1611-3349 0302-9743 http://hdl.handle.net/11449/233938 10.1007/978-3-030-91699-2_42 2-s2.0-85121798857 |
url |
http://dx.doi.org/10.1007/978-3-030-91699-2_42 http://hdl.handle.net/11449/233938 |
identifier_str_mv |
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 13074 LNAI, p. 616-627. 1611-3349 0302-9743 10.1007/978-3-030-91699-2_42 2-s2.0-85121798857 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
616-627 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
repositoriounesp@unesp.br |
_version_ |
1834483162804125696 |