A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2018 |
| Outros Autores: | , , |
| Tipo de documento: | Artigo |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10362/151417 |
Resumo: | Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003 |
| id |
RCAP_56ca302a11e3c6561d4d27096977a454 |
|---|---|
| oai_identifier_str |
oai:run.unl.pt:10362/151417 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfittingClassificationData errorsGenetic ProgrammingHidden overfittingNoisy labelsSemi-supervised learningComputer Science(all)Mathematics(all)Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003Data gathered in the real world normally contains noise, either stemming from inaccurate experimental measurements or introduced by human errors. Our work deals with classification data where the attribute values were accurately measured, but the categories may have been mislabeled by the human in several sample points, resulting in unreliable training data. Genetic Programming (GP) compares favorably with the Classification and Regression Trees (CART) method, but it is still highly affected by these errors. Despite consistently achieving high accuracy in both training and test sets, many classification errors are found in a later validation phase, revealing a previously hidden overfitting to the erroneous data. Furthermore, the evolved models frequently output raw values that are far from the expected range. To improve the behavior of the evolved models, we extend the original training set with additional sample points where the class label is unknown, and devise a simple way for GP to use this additional information and learn in a semi-supervised manner. The results are surprisingly good. In the presence of the exact same mislabeling errors, the additional unlabeled data allowed GP to evolve models that achieved high accuracy also in the validation phase. This is a brand new approach to semi-supervised learning that opens an array of possibilities for making the most of the abundance of unlabeled data available today, in a simple and inexpensive way.NOVA Information Management School (NOVA IMS)Information Management Research Center (MagIC) - NOVA Information Management SchoolRUNSilva, SaraVanneschi, LeonardoCabral, Ana I.R.Vasconcelos, Maria J.2024-01-27T01:32:02Z2018-04-012018-04-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article16application/pdfhttp://hdl.handle.net/10362/151417eng2210-6502PURE: 3788203https://doi.org/10.1016/j.swevo.2017.11.003info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:10:41Zoai:run.unl.pt:10362/151417Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:41:05.735777Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| title |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| spellingShingle |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting Silva, Sara Classification Data errors Genetic Programming Hidden overfitting Noisy labels Semi-supervised learning Computer Science(all) Mathematics(all) |
| title_short |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| title_full |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| title_fullStr |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| title_full_unstemmed |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| title_sort |
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting |
| author |
Silva, Sara |
| author_facet |
Silva, Sara Vanneschi, Leonardo Cabral, Ana I.R. Vasconcelos, Maria J. |
| author_role |
author |
| author2 |
Vanneschi, Leonardo Cabral, Ana I.R. Vasconcelos, Maria J. |
| author2_role |
author author author |
| dc.contributor.none.fl_str_mv |
NOVA Information Management School (NOVA IMS) Information Management Research Center (MagIC) - NOVA Information Management School RUN |
| dc.contributor.author.fl_str_mv |
Silva, Sara Vanneschi, Leonardo Cabral, Ana I.R. Vasconcelos, Maria J. |
| dc.subject.por.fl_str_mv |
Classification Data errors Genetic Programming Hidden overfitting Noisy labels Semi-supervised learning Computer Science(all) Mathematics(all) |
| topic |
Classification Data errors Genetic Programming Hidden overfitting Noisy labels Semi-supervised learning Computer Science(all) Mathematics(all) |
| description |
Silva, S., Vanneschi, L., Cabral, A. I. R., & Vasconcelos, M. J. (2018). A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation, 39(April), 323-338. DOI: 10.1016/j.swevo.2017.11.003 |
| publishDate |
2018 |
| dc.date.none.fl_str_mv |
2018-04-01 2018-04-01T00:00:00Z 2024-01-27T01:32:02Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/151417 |
| url |
http://hdl.handle.net/10362/151417 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
2210-6502 PURE: 3788203 https://doi.org/10.1016/j.swevo.2017.11.003 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
16 application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833596888878678016 |