Vehicle industry big data analysis using clustering approaches

Seixas, Lenon Diniz

Vehicle industry big data analysis using clustering approaches

Detalhes bibliográficos
Autor(a) principal:	Seixas, Lenon Diniz
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
Texto Completo:	http://repositorio.utfpr.edu.br/jspui/handle/1/31331
Resumo:	Working with data has become something fundamental and essential in the modern world. Considering a globalized world economy and industry, data analysis and visualization offer enlightening information for decision making and strategic planning. Data science provides diverse statistical and scientific methods to extract the most value possible from a data set, covering all the preparation, cleaning, aggregation, and manipulation of data. Machine Learning (ML) and Artificial Intelligence (AI) come along with it to learn and explore the data, uncovering things that can not be seen with only the analyst experience. The automotive sector, which significantly influences the world economy and industry, suffers from focusing on technology and short-term focus. Digital transformation has a disruptive effect, considered essential in the sector’s 140 years, causing car companies to offer customized products optimized for customer needs. To work extensively with data and data science becomes fundamental. So, this work brings a study to explore clustering methods in a Big Data dataset of a car company, performing a literature review; treating, normalizing and grouping using the researched methods; and, finally, comparing and analyzing the results. The Knowledge Discovery and Data mining method was used to perform the mining process, comparing the performance of the K-Means, Fuzzy C-Means (FCM) and Self-Organizing Maps (SOM) algorithm through some metrics: sum of squares within clusters (SSW), sum of squares between clusters (SSB), silhouette index (SI) and K-Fold cross-validation with homogeneity score. When evaluating the vehicle’s distribution of the results obtained, the ML algorithms tend to distribute more evenly among the clusters than the classification without learning, and the SI metric proves this as a good decision. The methods brought to this work showed satisfactory results on the dataset, and demonstrate how the application of ML can bring benefits to data mining. With this, we managed to answer the question "How can historical usage data help a truck manufacturer improve product development and fuel consumption?". K-Means is a good and main clustering technique, while FCM has also proved to be a good technique, working mainly with overlapping situations. FCM also brings an extra interpretation of cluster membership percentage that can help end users understand the data even more. For future works, it can be also implemented K-Medoids as an alternative method that considers an individual as the center of the cluster. This work can also be extended to other types of vehicle’s data set.

Metadados do item

id	UTFPR-12_e91a5865ae3bb06d2fb8a2dcd0297555
oai_identifier_str	oai:repositorio.utfpr.edu.br:1/31331
network_acronym_str	UTFPR-12
network_name_str	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository_id_str
spelling	Vehicle industry big data analysis using clustering approachesAnálise de dados volumosos da indústria de veículos usando abordagem de agrupamentoAprendizado do computadorIndústriasAutomaçãoAprendizagemAnálise por agrupamentoVeículos a motor - IndústriaBig dataMachine learningIndustriesAutomationLearningCluster analysisMotor vehicle industryCNPQ::ENGENHARIAS::ENGENHARIA ELETRICAEngenharia/Tecnologia/GestãoWorking with data has become something fundamental and essential in the modern world. Considering a globalized world economy and industry, data analysis and visualization offer enlightening information for decision making and strategic planning. Data science provides diverse statistical and scientific methods to extract the most value possible from a data set, covering all the preparation, cleaning, aggregation, and manipulation of data. Machine Learning (ML) and Artificial Intelligence (AI) come along with it to learn and explore the data, uncovering things that can not be seen with only the analyst experience. The automotive sector, which significantly influences the world economy and industry, suffers from focusing on technology and short-term focus. Digital transformation has a disruptive effect, considered essential in the sector’s 140 years, causing car companies to offer customized products optimized for customer needs. To work extensively with data and data science becomes fundamental. So, this work brings a study to explore clustering methods in a Big Data dataset of a car company, performing a literature review; treating, normalizing and grouping using the researched methods; and, finally, comparing and analyzing the results. The Knowledge Discovery and Data mining method was used to perform the mining process, comparing the performance of the K-Means, Fuzzy C-Means (FCM) and Self-Organizing Maps (SOM) algorithm through some metrics: sum of squares within clusters (SSW), sum of squares between clusters (SSB), silhouette index (SI) and K-Fold cross-validation with homogeneity score. When evaluating the vehicle’s distribution of the results obtained, the ML algorithms tend to distribute more evenly among the clusters than the classification without learning, and the SI metric proves this as a good decision. The methods brought to this work showed satisfactory results on the dataset, and demonstrate how the application of ML can bring benefits to data mining. With this, we managed to answer the question "How can historical usage data help a truck manufacturer improve product development and fuel consumption?". K-Means is a good and main clustering technique, while FCM has also proved to be a good technique, working mainly with overlapping situations. FCM also brings an extra interpretation of cluster membership percentage that can help end users understand the data even more. For future works, it can be also implemented K-Medoids as an alternative method that considers an individual as the center of the cluster. This work can also be extended to other types of vehicle’s data set.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Trabalhar com dados se tornou algo fundamental e imprescindível no mundo moderno. Considerando uma economia e indústria mundial globalizada, a análise e visualização de dados oferece informações esclarecedoras para tomadas de decisão e planejamento estratégico. Para extrair o máximo valor possível de um conjunto de dados, a ciência de dados oferece diversos métodos estatísticos e científicos, abrangendo toda a preparação, limpeza, agregação e manipulação de dados. O Aprendizado de máquina (ML) e a Inteligência Artificial (AI) vêm juntos para aprender e explorar os dados, descobrindo coisas que não podem ser vistas apenas com a experiência do analista. O setor automotivo, que possui grande influência na economia e indústria mundial, sofre os impactos de ser focado em tecnologia e em prazos relativamente curtos. A transformação digital tem então um efeito disruptivo, considerado como o fenômeno mais importante nos 140 anos do setor, fazendo com que empresas de automóveis ofereçam produtos personalizados e otimizados para as necessidades do cliente. Para fazer isso, é necessário trabalhar extensivamente com dados e ciência de dados. Então, este trabalho traz um estudo para investigar os métodos de clustering de um conjunto de dados Big Data de uma companhia de automóveis, realizando uma revisão da literatura, tratando, normalizando e agrupando usando os métodos pesquisados, e, por fim, comparando e analisando os resultados. Foi utilizado o método Knowledge Discovery and Data mining para realizar o processo de mineração, comparando o desempenho dos algoritmos K-Means, Fuzzy CMeans (FCM) e Mapas Auto-Organizáveis (SOM) por meio de algumas métricas: soma dos quadrados dentro de clusters (SSW), soma dos quadrados entre clusters (SSB), índice silhueta (SI) e validação cruzada K-Fold com pontuação de homogeneidade. Para o parâmetro de inclinação os algoritmos de ML trouxeram uma melhor resposta em geral quando comparado ao método de classificação por regras chamado GTA, que não é um algoritmo de aprendizado de máquina, quando analisando as métricas apresentadas. Dentre os algoritmos de ML implementados, K-Means e Fuzzy C-Means, K-Means é ligeiramente superior para as métricas SSW e SSB, porém o Fuzzy C-Means é melhor nas métricas SI e validação cruzada. Quando é analisado o conjunto dos resultados obtidos, os algoritmos de ML tendem a distribuir mais igualitariamente a população entre os clusters do que a classificação sem aprendizado e a métrica SI comprova isso como uma boa decisão. Os métodos trazidos para este trabalho apresentaram resultados satisfatórios sobre o conjunto de dados, e mostram como a aplicação de ML pode trazer benefícios à mineração de dados. Com isso, conseguimos responder à pergunta "Como os dados históricos de uso podem ajudar uma fabricante de caminhões a melhorar o desenvolvimento de produtos e o consumo de combustível?". O K-Means é uma boa opção para técnica de agrupamento, enquanto o FCM também se mostrou uma boa técnica, trabalhando bem principalmente com situações de sobreposição. O FCM traz também uma interpretação extra da porcentagem de associação do cluster que pode ajudar os usuários finais a entender ainda mais os dados. Para trabalhos futuros, também podem ser implementados K-Medoids como método alternativo que considera um indivíduo como o centro do cluster. Este trabalho pode ser estendido para outros tipos de conjuntos de dados.Universidade Tecnológica Federal do ParanáPonta GrossaBrasilPrograma de Pós-Graduação em Engenharia ElétricaUTFPRCorrêa, Fernanda Cristinahttps://orcid.org/0000-0003-4907-0395http://lattes.cnpq.br/1495216809511536Martins, Marcella Scoczynski Ribeirohttps://orcid.org/0000-0002-5716-4968http://lattes.cnpq.br/5212122361603572Corrêa, Fernanda Cristinahttps://orcid.org/0000-0003-4907-0395http://lattes.cnpq.br/1495216809511536Martins, Marcella Scoczynski Ribeirohttps://orcid.org/0000-0002-5716-4968http://lattes.cnpq.br/5212122361603572Reis, Márcio Rodrigues da Cunhahttps://orcid.org/0000-0002-5555-7389http://lattes.cnpq.br/1167385371830496Delgado, Myriam Regattieri de Biase da Silvahttps://orcid.org/0000-0002-2791-174Xhttp://lattes.cnpq.br/4166922845507601Seixas, Lenon Diniz2023-05-04T13:54:52Z2023-05-04T13:54:52Z2022-11-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfSEIXAS, Lenon Diniz. Vehicle industry big data analysis using clustering approaches. 2022. Dissertação (Mestrado em Engenharia Elétrica) - Universidade Tecnológica Federal do Paraná, Ponta Grossa, 2022.http://repositorio.utfpr.edu.br/jspui/handle/1/31331enghttp://creativecommons.org/licenses/by-nc-sa/4.0/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))instname:Universidade Tecnológica Federal do Paraná (UTFPR)instacron:UTFPR2023-05-05T06:08:03Zoai:repositorio.utfpr.edu.br:1/31331Repositório InstitucionalPUBhttp://repositorio.utfpr.edu.br:8080/oai/requestriut@utfpr.edu.br \|\| sibi@utfpr.edu.bropendoar:2023-05-05T06:08:03Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)false
dc.title.none.fl_str_mv	Vehicle industry big data analysis using clustering approaches Análise de dados volumosos da indústria de veículos usando abordagem de agrupamento
title	Vehicle industry big data analysis using clustering approaches
spellingShingle	Vehicle industry big data analysis using clustering approaches Seixas, Lenon Diniz Aprendizado do computador Indústrias Automação Aprendizagem Análise por agrupamento Veículos a motor - Indústria Big data Machine learning Industries Automation Learning Cluster analysis Motor vehicle industry CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA Engenharia/Tecnologia/Gestão
title_short	Vehicle industry big data analysis using clustering approaches
title_full	Vehicle industry big data analysis using clustering approaches
title_fullStr	Vehicle industry big data analysis using clustering approaches
title_full_unstemmed	Vehicle industry big data analysis using clustering approaches
title_sort	Vehicle industry big data analysis using clustering approaches
author	Seixas, Lenon Diniz
author_facet	Seixas, Lenon Diniz
author_role	author
dc.contributor.none.fl_str_mv	Corrêa, Fernanda Cristina https://orcid.org/0000-0003-4907-0395 http://lattes.cnpq.br/1495216809511536 Martins, Marcella Scoczynski Ribeiro https://orcid.org/0000-0002-5716-4968 http://lattes.cnpq.br/5212122361603572 Corrêa, Fernanda Cristina https://orcid.org/0000-0003-4907-0395 http://lattes.cnpq.br/1495216809511536 Martins, Marcella Scoczynski Ribeiro https://orcid.org/0000-0002-5716-4968 http://lattes.cnpq.br/5212122361603572 Reis, Márcio Rodrigues da Cunha https://orcid.org/0000-0002-5555-7389 http://lattes.cnpq.br/1167385371830496 Delgado, Myriam Regattieri de Biase da Silva https://orcid.org/0000-0002-2791-174X http://lattes.cnpq.br/4166922845507601
dc.contributor.author.fl_str_mv	Seixas, Lenon Diniz
dc.subject.por.fl_str_mv	Aprendizado do computador Indústrias Automação Aprendizagem Análise por agrupamento Veículos a motor - Indústria Big data Machine learning Industries Automation Learning Cluster analysis Motor vehicle industry CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA Engenharia/Tecnologia/Gestão
topic	Aprendizado do computador Indústrias Automação Aprendizagem Análise por agrupamento Veículos a motor - Indústria Big data Machine learning Industries Automation Learning Cluster analysis Motor vehicle industry CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA Engenharia/Tecnologia/Gestão
description	Working with data has become something fundamental and essential in the modern world. Considering a globalized world economy and industry, data analysis and visualization offer enlightening information for decision making and strategic planning. Data science provides diverse statistical and scientific methods to extract the most value possible from a data set, covering all the preparation, cleaning, aggregation, and manipulation of data. Machine Learning (ML) and Artificial Intelligence (AI) come along with it to learn and explore the data, uncovering things that can not be seen with only the analyst experience. The automotive sector, which significantly influences the world economy and industry, suffers from focusing on technology and short-term focus. Digital transformation has a disruptive effect, considered essential in the sector’s 140 years, causing car companies to offer customized products optimized for customer needs. To work extensively with data and data science becomes fundamental. So, this work brings a study to explore clustering methods in a Big Data dataset of a car company, performing a literature review; treating, normalizing and grouping using the researched methods; and, finally, comparing and analyzing the results. The Knowledge Discovery and Data mining method was used to perform the mining process, comparing the performance of the K-Means, Fuzzy C-Means (FCM) and Self-Organizing Maps (SOM) algorithm through some metrics: sum of squares within clusters (SSW), sum of squares between clusters (SSB), silhouette index (SI) and K-Fold cross-validation with homogeneity score. When evaluating the vehicle’s distribution of the results obtained, the ML algorithms tend to distribute more evenly among the clusters than the classification without learning, and the SI metric proves this as a good decision. The methods brought to this work showed satisfactory results on the dataset, and demonstrate how the application of ML can bring benefits to data mining. With this, we managed to answer the question "How can historical usage data help a truck manufacturer improve product development and fuel consumption?". K-Means is a good and main clustering technique, while FCM has also proved to be a good technique, working mainly with overlapping situations. FCM also brings an extra interpretation of cluster membership percentage that can help end users understand the data even more. For future works, it can be also implemented K-Medoids as an alternative method that considers an individual as the center of the cluster. This work can also be extended to other types of vehicle’s data set.
publishDate	2022
dc.date.none.fl_str_mv	2022-11-16 2023-05-04T13:54:52Z 2023-05-04T13:54:52Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	SEIXAS, Lenon Diniz. Vehicle industry big data analysis using clustering approaches. 2022. Dissertação (Mestrado em Engenharia Elétrica) - Universidade Tecnológica Federal do Paraná, Ponta Grossa, 2022. http://repositorio.utfpr.edu.br/jspui/handle/1/31331
identifier_str_mv	SEIXAS, Lenon Diniz. Vehicle industry big data analysis using clustering approaches. 2022. Dissertação (Mestrado em Engenharia Elétrica) - Universidade Tecnológica Federal do Paraná, Ponta Grossa, 2022.
url	http://repositorio.utfpr.edu.br/jspui/handle/1/31331
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Tecnológica Federal do Paraná Ponta Grossa Brasil Programa de Pós-Graduação em Engenharia Elétrica UTFPR
publisher.none.fl_str_mv	Universidade Tecnológica Federal do Paraná Ponta Grossa Brasil Programa de Pós-Graduação em Engenharia Elétrica UTFPR
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) instname:Universidade Tecnológica Federal do Paraná (UTFPR) instacron:UTFPR
instname_str	Universidade Tecnológica Federal do Paraná (UTFPR)
instacron_str	UTFPR
institution	UTFPR
reponame_str	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
collection	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository.name.fl_str_mv	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)
repository.mail.fl_str_mv	riut@utfpr.edu.br \|\| sibi@utfpr.edu.br
_version_	1850497944600444928

Vehicle industry big data analysis using clustering approaches

Registros relacionados