Exploring events and distributed representations of text in multi-document summarization

Bibliographic Details
Main Author: Marujo, L.
Publication Date: 2016
Other Authors: Ling, W., Ribeiro, R., Gershman, A., Carbonell, J., de Matos, D., Neto, J. P.
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10071/11022
Summary: In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.
id RCAP_cafc6c65b6840fe9c93e159cd077159b
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/11022
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Exploring events and distributed representations of text in multi-document summarizationMulti-document summarizationExtractive summarizationEvent detectionDistributed representations of textIn this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.Elsevier Science BV2016-03-04T15:03:33Z2016-01-01T00:00:00Z20162019-03-28T16:29:57Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/11022eng0950-705110.1016/j.knosys.2015.11.005Marujo, L.Ling, W.Ribeiro, R.Gershman, A.Carbonell, J.de Matos, D.Neto, J. P.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-07-07T02:58:23Zoai:repositorio.iscte-iul.pt:10071/11022Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:12:11.096010Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Exploring events and distributed representations of text in multi-document summarization
title Exploring events and distributed representations of text in multi-document summarization
spellingShingle Exploring events and distributed representations of text in multi-document summarization
Marujo, L.
Multi-document summarization
Extractive summarization
Event detection
Distributed representations of text
title_short Exploring events and distributed representations of text in multi-document summarization
title_full Exploring events and distributed representations of text in multi-document summarization
title_fullStr Exploring events and distributed representations of text in multi-document summarization
title_full_unstemmed Exploring events and distributed representations of text in multi-document summarization
title_sort Exploring events and distributed representations of text in multi-document summarization
author Marujo, L.
author_facet Marujo, L.
Ling, W.
Ribeiro, R.
Gershman, A.
Carbonell, J.
de Matos, D.
Neto, J. P.
author_role author
author2 Ling, W.
Ribeiro, R.
Gershman, A.
Carbonell, J.
de Matos, D.
Neto, J. P.
author2_role author
author
author
author
author
author
dc.contributor.author.fl_str_mv Marujo, L.
Ling, W.
Ribeiro, R.
Gershman, A.
Carbonell, J.
de Matos, D.
Neto, J. P.
dc.subject.por.fl_str_mv Multi-document summarization
Extractive summarization
Event detection
Distributed representations of text
topic Multi-document summarization
Extractive summarization
Event detection
Distributed representations of text
description In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.
publishDate 2016
dc.date.none.fl_str_mv 2016-03-04T15:03:33Z
2016-01-01T00:00:00Z
2016
2019-03-28T16:29:57Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/11022
url http://hdl.handle.net/10071/11022
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0950-7051
10.1016/j.knosys.2015.11.005
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier Science BV
publisher.none.fl_str_mv Elsevier Science BV
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833597255402127360