Searching dynamic Web pages with semi-structured contents
Main Author: | |
---|---|
Publication Date: | 2003 |
Other Authors: | , , |
Format: | Book |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | https://repositorio-aberto.up.pt/handle/10216/621 |
Summary: | At present, information systems (IS) in higher education are usually supported by databases (DB) and accessed through a Web interface. So happens with SiFEUP, the IS of the Engineering Faculty of the University of Porto (FEUP). The typical SiFEUP user sees the system as a collection of Web pages and is not aware of the fact that most of them do not exist in the sense of being an actual HTML file stored in a server but corresponds to HTML code generated on the fly by a designated program that accesses the DB and brings the most up-to-date information to the user desktop. Typical search engines do not index dynamically generated Web pages or just do that for those that are specifically mentioned in a static page and do not follow on the links the dynamic page may contain. In this paper we describe the development of a search facility for SiFEUP, how the limitations put to indexing dynamic Web pages were circumvented, and an evaluation of the results obtained. The solution involves using a locally developed crawler, the Oracle Text full text indexer, plus meta-information automatically drawn from the DB or manually added to improve the relevance factor calculation. |
id |
RCAP_2b4bcdb34cf966a05a3f08731fd16f6f |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/621 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Searching dynamic Web pages with semi-structured contentsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringAt present, information systems (IS) in higher education are usually supported by databases (DB) and accessed through a Web interface. So happens with SiFEUP, the IS of the Engineering Faculty of the University of Porto (FEUP). The typical SiFEUP user sees the system as a collection of Web pages and is not aware of the fact that most of them do not exist in the sense of being an actual HTML file stored in a server but corresponds to HTML code generated on the fly by a designated program that accesses the DB and brings the most up-to-date information to the user desktop. Typical search engines do not index dynamically generated Web pages or just do that for those that are specifically mentioned in a static page and do not follow on the links the dynamic page may contain. In this paper we describe the development of a search facility for SiFEUP, how the limitations put to indexing dynamic Web pages were circumvented, and an evaluation of the results obtained. The solution involves using a locally developed crawler, the Oracle Text full text indexer, plus meta-information automatically drawn from the DB or manually added to improve the relevance factor calculation.20032003-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/mswordhttps://repositorio-aberto.up.pt/handle/10216/621engFilipe SilvaArmando OliveiraLígia M. RibeiroGabriel Davidinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-27T19:15:06Zoai:repositorio-aberto.up.pt:10216/621Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T23:12:33.831738Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Searching dynamic Web pages with semi-structured contents |
title |
Searching dynamic Web pages with semi-structured contents |
spellingShingle |
Searching dynamic Web pages with semi-structured contents Filipe Silva Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Searching dynamic Web pages with semi-structured contents |
title_full |
Searching dynamic Web pages with semi-structured contents |
title_fullStr |
Searching dynamic Web pages with semi-structured contents |
title_full_unstemmed |
Searching dynamic Web pages with semi-structured contents |
title_sort |
Searching dynamic Web pages with semi-structured contents |
author |
Filipe Silva |
author_facet |
Filipe Silva Armando Oliveira Lígia M. Ribeiro Gabriel David |
author_role |
author |
author2 |
Armando Oliveira Lígia M. Ribeiro Gabriel David |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Filipe Silva Armando Oliveira Lígia M. Ribeiro Gabriel David |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
At present, information systems (IS) in higher education are usually supported by databases (DB) and accessed through a Web interface. So happens with SiFEUP, the IS of the Engineering Faculty of the University of Porto (FEUP). The typical SiFEUP user sees the system as a collection of Web pages and is not aware of the fact that most of them do not exist in the sense of being an actual HTML file stored in a server but corresponds to HTML code generated on the fly by a designated program that accesses the DB and brings the most up-to-date information to the user desktop. Typical search engines do not index dynamically generated Web pages or just do that for those that are specifically mentioned in a static page and do not follow on the links the dynamic page may contain. In this paper we describe the development of a search facility for SiFEUP, how the limitations put to indexing dynamic Web pages were circumvented, and an evaluation of the results obtained. The solution involves using a locally developed crawler, the Oracle Text full text indexer, plus meta-information automatically drawn from the DB or manually added to improve the relevance factor calculation. |
publishDate |
2003 |
dc.date.none.fl_str_mv |
2003 2003-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/book |
format |
book |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio-aberto.up.pt/handle/10216/621 |
url |
https://repositorio-aberto.up.pt/handle/10216/621 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/msword |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833600055769038848 |