Export Ready — 

Text based classification of companies in CrunchBase

Bibliographic Details
Main Author: Batista, F.
Publication Date: 2015
Other Authors: João P. Carvalho
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10071/25098
Summary: This paper introduces two fuzzy fingerprint based text classification techniques that were successfully applied to automatically label companies from CrunchBase, based purely on their unstructured textual description. This is a real and very challenging problem due to the large set of possible labels (more than 40) and also to the fact that the textual descriptions do not have to abide by any criteria and are, therefore, extremely heterogeneous. Fuzzy fingerprints are a recently introduced technique that can be used for performing fast classification. They perform well in the presence of unbalanced datasets and can cope with a very large number of classes. In the paper, a comparison is performed against some of the best text classification techniques commonly used to address similar problems. When applied to the CrunchBase dataset, the fuzzy fingerprint based approach outperformed the other techniques.
id RCAP_8905ea78b3ad7f127cf8a83782c116d9
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/25098
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Text based classification of companies in CrunchBaseText classificationFuzzy fingerprintsText miningCrunchbaseDocument classificationThis paper introduces two fuzzy fingerprint based text classification techniques that were successfully applied to automatically label companies from CrunchBase, based purely on their unstructured textual description. This is a real and very challenging problem due to the large set of possible labels (more than 40) and also to the fact that the textual descriptions do not have to abide by any criteria and are, therefore, extremely heterogeneous. Fuzzy fingerprints are a recently introduced technique that can be used for performing fast classification. They perform well in the presence of unbalanced datasets and can cope with a very large number of classes. In the paper, a comparison is performed against some of the best text classification techniques commonly used to address similar problems. When applied to the CrunchBase dataset, the fuzzy fingerprint based approach outperformed the other techniques.IEEE2022-04-08T09:25:46Z2015-01-01T00:00:00Z20152022-04-08T10:22:26Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10071/25098eng978-1-4673-7428-61544-561510.1109/FUZZ-IEEE.2015.7337892Batista, F.João P. Carvalhoinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-07-07T03:20:18Zoai:repositorio.iscte-iul.pt:10071/25098Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:21:01.760662Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Text based classification of companies in CrunchBase
title Text based classification of companies in CrunchBase
spellingShingle Text based classification of companies in CrunchBase
Batista, F.
Text classification
Fuzzy fingerprints
Text mining
Crunchbase
Document classification
title_short Text based classification of companies in CrunchBase
title_full Text based classification of companies in CrunchBase
title_fullStr Text based classification of companies in CrunchBase
title_full_unstemmed Text based classification of companies in CrunchBase
title_sort Text based classification of companies in CrunchBase
author Batista, F.
author_facet Batista, F.
João P. Carvalho
author_role author
author2 João P. Carvalho
author2_role author
dc.contributor.author.fl_str_mv Batista, F.
João P. Carvalho
dc.subject.por.fl_str_mv Text classification
Fuzzy fingerprints
Text mining
Crunchbase
Document classification
topic Text classification
Fuzzy fingerprints
Text mining
Crunchbase
Document classification
description This paper introduces two fuzzy fingerprint based text classification techniques that were successfully applied to automatically label companies from CrunchBase, based purely on their unstructured textual description. This is a real and very challenging problem due to the large set of possible labels (more than 40) and also to the fact that the textual descriptions do not have to abide by any criteria and are, therefore, extremely heterogeneous. Fuzzy fingerprints are a recently introduced technique that can be used for performing fast classification. They perform well in the presence of unbalanced datasets and can cope with a very large number of classes. In the paper, a comparison is performed against some of the best text classification techniques commonly used to address similar problems. When applied to the CrunchBase dataset, the fuzzy fingerprint based approach outperformed the other techniques.
publishDate 2015
dc.date.none.fl_str_mv 2015-01-01T00:00:00Z
2015
2022-04-08T09:25:46Z
2022-04-08T10:22:26Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/25098
url http://hdl.handle.net/10071/25098
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 978-1-4673-7428-6
1544-5615
10.1109/FUZZ-IEEE.2015.7337892
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv IEEE
publisher.none.fl_str_mv IEEE
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597351457980416