A multilingual and multidomain study on dialog act recognition using character-level tokenization
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Texto Completo: | http://hdl.handle.net/10071/17539 |
Resumo: | Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages—English, Spanish, and German—which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels. |
id |
RCAP_6a6e418fd6eec78a48e66a7fa8c1e18f |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/17539 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
A multilingual and multidomain study on dialog act recognition using character-level tokenizationDialog act recognitionCharacter-levelMultilingualityMultidomainAutomatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages—English, Spanish, and German—which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.MDPI2019-03-08T15:20:26Z2019-01-01T00:00:00Z20192024-11-07T09:31:44Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/17539eng2078-248910.3390/info10030094Ribeiro, E.Ribeiro, R.Matos, D. M. de.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-11-10T01:16:22Zoai:repositorio.iscte-iul.pt:10071/17539Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:20:54.413060Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
title |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
spellingShingle |
A multilingual and multidomain study on dialog act recognition using character-level tokenization Ribeiro, E. Dialog act recognition Character-level Multilinguality Multidomain |
title_short |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
title_full |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
title_fullStr |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
title_full_unstemmed |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
title_sort |
A multilingual and multidomain study on dialog act recognition using character-level tokenization |
author |
Ribeiro, E. |
author_facet |
Ribeiro, E. Ribeiro, R. Matos, D. M. de. |
author_role |
author |
author2 |
Ribeiro, R. Matos, D. M. de. |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Ribeiro, E. Ribeiro, R. Matos, D. M. de. |
dc.subject.por.fl_str_mv |
Dialog act recognition Character-level Multilinguality Multidomain |
topic |
Dialog act recognition Character-level Multilinguality Multidomain |
description |
Automatic dialog act recognition is an important step for dialog systems since it reveals the intention behind the words uttered by its conversational partners. Although most approaches on the task use word-level tokenization, there is information at the sub-word level that is related to the function of the words and, consequently, their intention. Thus, in this study, we explored the use of character-level tokenization to capture that information. We explored the use of multiple character windows of different sizes to capture morphological aspects, such as affixes and lemmas, as well as inter-word information. Furthermore, we assessed the importance of punctuation and capitalization for the task. To broaden the conclusions of our study, we performed experiments on dialogs in three languages—English, Spanish, and German—which have different morphological characteristics. Furthermore, the dialogs cover multiple domains and are annotated with both domain-dependent and domain-independent dialog act labels. The achieved results not only show that the character-level approach leads to similar or better performance than the state-of-the-art word-level approaches on the task, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-03-08T15:20:26Z 2019-01-01T00:00:00Z 2019 2024-11-07T09:31:44Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/17539 |
url |
http://hdl.handle.net/10071/17539 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2078-2489 10.3390/info10030094 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
MDPI |
publisher.none.fl_str_mv |
MDPI |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833597349836881920 |