Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
Santos Júnior, Valmir Oliveira dos |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://repositorio.ufc.br/handle/riufc/78233
|
Resumo: |
It is increasingly common to use chatbots as service interfaces. One of the main components of a chatbot is the NLU module, responsible for interpreting the text, extracting the intent, and identifying the entities present. It is possible to focus on just one of these NLU tasks, such as intent classification. To train an NLU intent classification model usually requires a considerable amount of annotated data, where each sentence in the dataset is labeled with an intent. Depending on the volume of data, manual data labeling can be laborious and time-consuming. Thus, an unsupervised machine learning technique, such as data clustering, could be applied to find and label patterns. For this task, an effective text vector representation that captures semantic information and helps the machine understand the context, intent, and other nuances of the entire text is essential. This work extensively evaluates different text embedding models for clustering and labeling. Some operations are also applied to improve the dataset’s quality, where the least representative sentences of each generated group are discarded. Then, some Intent Classification Models are trained using two architectures based on Neural Networks, using service text from PPC. A dataset was also manually annotated to be used as validation data. A study was conducted on semiautomatic labeling, implemented through data clustering and visual inspection, which introduced some labeling errors in the intent classification models. However, it would be unfeasible to manually annotate the entire dataset. Nonetheless, models were built that achieved over 98% accuracy with test data and over 96% with validation data. |