Caraterização de um corpus jornalístico português
| Main Author: | |
|---|---|
| Publication Date: | 2015 |
| Format: | Master thesis |
| Language: | por |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | https://hdl.handle.net/10216/83538 |
Summary: | In this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments. |
| id |
RCAP_58405152b48ac0fc10f819f1b1d06a02 |
|---|---|
| oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/83538 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Caraterização de um corpus jornalístico portuguêsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringIn this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments.2015-07-202015-07-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/83538TID:201304805porHenrique Manuel Martins Moreira Teixeira de Sousainfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-27T19:42:00Zoai:repositorio-aberto.up.pt:10216/83538Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T23:28:41.005013Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Caraterização de um corpus jornalístico português |
| title |
Caraterização de um corpus jornalístico português |
| spellingShingle |
Caraterização de um corpus jornalístico português Henrique Manuel Martins Moreira Teixeira de Sousa Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
| title_short |
Caraterização de um corpus jornalístico português |
| title_full |
Caraterização de um corpus jornalístico português |
| title_fullStr |
Caraterização de um corpus jornalístico português |
| title_full_unstemmed |
Caraterização de um corpus jornalístico português |
| title_sort |
Caraterização de um corpus jornalístico português |
| author |
Henrique Manuel Martins Moreira Teixeira de Sousa |
| author_facet |
Henrique Manuel Martins Moreira Teixeira de Sousa |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Henrique Manuel Martins Moreira Teixeira de Sousa |
| dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
| topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
| description |
In this dissertation we organize and characterize a news article archive from the portuguese online journal JornalismoPortoNet (JPN), creating a text corpus with content from several authors and topics. A corpus represents a group of texts in which one can perform statistical analysis or hypothesis testing, mainly in the field of linguistics. The growing computing power eases the processing of large corpora (searching, treatment, selection, etc\.). This corpus has the objective of being a true representation of the jornalistic text practised by JPN, collecting titles, subtitles, authors, related news, categories and dates of publishing, while including a small part refering to the reader's opinion (news comments). The corpus will be annotated in respect to the POS tags used and the named entities mentioned in the text. Following this, an deep analysis will be performed about the morphological and categorical composition of the news articles, including research about relationships between news and the differences between the huge array of authors with varying experience. There will also be a characterization in respect to the named entities in the text, categorizing them as people, locals or organizations and revealing the relationship network between these entities. Finally, the public's reception to the jornalistic material, be it in page visualizations or readers' comments. |
| publishDate |
2015 |
| dc.date.none.fl_str_mv |
2015-07-20 2015-07-20T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/83538 TID:201304805 |
| url |
https://hdl.handle.net/10216/83538 |
| identifier_str_mv |
TID:201304805 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833600169646489600 |