Term Weighting based on Document Revision History
Main Author: | |
---|---|
Publication Date: | 2011 |
Other Authors: | , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://repositorio.inesctec.pt/handle/123456789/2237 http://dx.doi.org/10.1002/asi.21597 |
Summary: | In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our [...] |
id |
RCAP_506937006da44b7b97929176eea9ba9c |
---|---|
oai_identifier_str |
oai:repositorio.inesctec.pt:123456789/2237 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Term Weighting based on Document Revision HistoryIn real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our [...]2017-11-16T13:23:31Z2011-01-01T00:00:00Z2011info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/2237http://dx.doi.org/10.1002/asi.21597engSérgio NunesCristina RibeiroGabriel Davidinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-10-12T02:19:54Zoai:repositorio.inesctec.pt:123456789/2237Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:56:28.707834Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Term Weighting based on Document Revision History |
title |
Term Weighting based on Document Revision History |
spellingShingle |
Term Weighting based on Document Revision History Sérgio Nunes |
title_short |
Term Weighting based on Document Revision History |
title_full |
Term Weighting based on Document Revision History |
title_fullStr |
Term Weighting based on Document Revision History |
title_full_unstemmed |
Term Weighting based on Document Revision History |
title_sort |
Term Weighting based on Document Revision History |
author |
Sérgio Nunes |
author_facet |
Sérgio Nunes Cristina Ribeiro Gabriel David |
author_role |
author |
author2 |
Cristina Ribeiro Gabriel David |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Sérgio Nunes Cristina Ribeiro Gabriel David |
description |
In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our [...] |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011-01-01T00:00:00Z 2011 2017-11-16T13:23:31Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.inesctec.pt/handle/123456789/2237 http://dx.doi.org/10.1002/asi.21597 |
url |
http://repositorio.inesctec.pt/handle/123456789/2237 http://dx.doi.org/10.1002/asi.21597 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833597767197392896 |