Detalhes bibliográficos
Ano de defesa: |
2020 |
Autor(a) principal: |
Lemes, Jonatan de Sá
![lattes](/bdtd/themes/bdtd/images/lattes.gif?_=1676566308) |
Orientador(a): |
Bastos, Marcus Vinicius Fainer |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Pontifícia Universidade Católica de São Paulo
|
Programa de Pós-Graduação: |
Programa de Estudos Pós-Graduados em Tecnologias da Inteligência e Design Digital
|
Departamento: |
Faculdade de Ciências Exatas e Tecnologia
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
https://tede2.pucsp.br/handle/handle/23285
|
Resumo: |
With the increase in interaction between customers and digital platforms, there is a need to produce increasingly sophisticated solutions, aiming to reduce costs and meet a growing demand for customer service, present in various business niches. In this context, computer programs called Chatbots emerge, which in a way, aim to supply this need. The construction of Chatbots on more modern platforms requires from designers a series of inserts of prior knowledge so that they can become functional, however, a question arises: what content should be predicted as a knowledge base? Which Entities, Intentions and Dialogues are expected for the business? This research seeks to quantify and explore methods of extracting information from known data sources in service channels. The main objective of the research is to support the Chatbot designer in creating the service scripts without depending on his empirical experience and diffuse information about the business. In this research, statistical and probabilistic techniques are considered to extract information from data sources, whether structured or not. The most common approach to building Chatbots is described based on the concepts of: Entity, Intent and Dialogue, as well as an alternative approach based on Markov Chains; A case study is proposed; For the extraction of information, techniques for converting audio to text (TTS) were considered, which demonstrated a slight to moderate loss in form; Generation and pre-processing of Linguistic and Computational Corpus (Tokenization, Steeming, Lemmatization, Filters); Application of frequency calculation (TF) techniques, considered satisfactory, as it reveals the vocabulary of the business; Relevance of terms (TF-IDF), considered unsatisfactory, for displaying common and irrelevant terms for the business; Labeling techniques (POS Tagging), considered satisfactory, however, with processing limitations; Entity Extraction (NER), considered satisfactory, with accuracy restrictions linked to the training set used; Extraction of Intentions and Dialogues using syntactic labeling, which was sensitive from the point of view of human analysis due to the volume of sentences generated; Clustering of terms (KMeans) using dimensionality reduction (PCA), considered unsatisfactory, due to the sparse data presented; Probabilistic classification of texts (Bayes), considered satisfactory, however, with quality restriction depending on the training set; At the end, a software modeling (UML) is proposed, presenting diagrams of use cases, classes and sequence, an entity relationship model (MER) for data persistence and screen prototypes related to the expected support software. It is concluded in general that there is the possibility of extracting considerable information to design a Chatbot through the application of the techniques described in the research. It should be noted that the cognitive effort offered to the designer can vary depending on the volume of data to be processed |