Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]

Bibliographic Details
Main Author: Matos, Luís Miguel
Publication Date: 2022
Other Authors: Azevedo, João, Matta, Arthur, Pilastri, André, Cortez, Paulo, Mendes, Rui
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/81434
Summary: Categorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Support Vector Machines) and several real-world ML applications are associated with categorical data attributes. Currently, CANE offers three categorical to numeric transformation methods, namely: Percentage Categorical Pruned (PCP), Inverse Document Frequency (IDF) and a simpler One-Hot-Encoding method. Additionally, the CANE module is well documented with several code examples that can help in its adoption by non expert users.
id RCAP_698376a86f20dbf9cf3131a8035a2d32
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/81434
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]CANEData preprocessingMachine learningPython programming languageScience & TechnologyCategorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Support Vector Machines) and several real-world ML applications are associated with categorical data attributes. Currently, CANE offers three categorical to numeric transformation methods, namely: Percentage Categorical Pruned (PCP), Inverse Document Frequency (IDF) and a simpler One-Hot-Encoding method. Additionally, the CANE module is well documented with several code examples that can help in its adoption by non expert users.The authors are grateful for project NORTE-01-0247-FEDER-017497, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). This work was also supported by FCT Fundação para a Ciência e Tecnologia, Portugal within the Project Scope: UID/CEC/00319/2019. The authors are also grateful for all the contributors that assisted in making CANE more intuitive.ElsevierUniversidade do MinhoMatos, Luís MiguelAzevedo, JoãoMatta, ArthurPilastri, AndréCortez, PauloMendes, Rui2022-08-012022-08-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/81434engMatos, L. M., Azevedo, J., Matta, A., Pilastri, A., Cortez, P., & Mendes, R. (2022, August). Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing. Software Impacts. Elsevier BV. http://doi.org/10.1016/j.simpa.2022.1003592665-963810.1016/j.simpa.2022.100359https://www.sciencedirect.com/science/article/pii/S2665963822000720info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-04-12T05:19:22Zoai:repositorium.sdum.uminho.pt:1822/81434Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T16:23:18.438872Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
title Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
spellingShingle Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
Matos, Luís Miguel
CANE
Data preprocessing
Machine learning
Python programming language
Science & Technology
title_short Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
title_full Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
title_fullStr Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
title_full_unstemmed Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
title_sort Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing[Formula presented]
author Matos, Luís Miguel
author_facet Matos, Luís Miguel
Azevedo, João
Matta, Arthur
Pilastri, André
Cortez, Paulo
Mendes, Rui
author_role author
author2 Azevedo, João
Matta, Arthur
Pilastri, André
Cortez, Paulo
Mendes, Rui
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Matos, Luís Miguel
Azevedo, João
Matta, Arthur
Pilastri, André
Cortez, Paulo
Mendes, Rui
dc.subject.por.fl_str_mv CANE
Data preprocessing
Machine learning
Python programming language
Science & Technology
topic CANE
Data preprocessing
Machine learning
Python programming language
Science & Technology
description Categorical Attribute traNsformation Environment (CANE) is a simpler but powerful data categorical preprocessing Python package. The package is valuable since there is currently a large range of Machine Learning (ML) algorithms that can only be trained using numerical data (e.g., Deep Learning, Support Vector Machines) and several real-world ML applications are associated with categorical data attributes. Currently, CANE offers three categorical to numeric transformation methods, namely: Percentage Categorical Pruned (PCP), Inverse Document Frequency (IDF) and a simpler One-Hot-Encoding method. Additionally, the CANE module is well documented with several code examples that can help in its adoption by non expert users.
publishDate 2022
dc.date.none.fl_str_mv 2022-08-01
2022-08-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/81434
url https://hdl.handle.net/1822/81434
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Matos, L. M., Azevedo, J., Matta, A., Pilastri, A., Cortez, P., & Mendes, R. (2022, August). Categorical Attribute traNsformation Environment (CANE): A python module for categorical to numeric data preprocessing. Software Impacts. Elsevier BV. http://doi.org/10.1016/j.simpa.2022.100359
2665-9638
10.1016/j.simpa.2022.100359
https://www.sciencedirect.com/science/article/pii/S2665963822000720
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595913631694848