Export Ready — 

Optimizing digital archiving: An artificial intelligence approach for OCR error correction

Bibliographic Details
Main Author: Fernandes, Bruno Daniel Alho
Publication Date: 2023
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10362/152939
Summary: Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
id RCAP_8f28e33044bfbca8123bd3efd91675ed
oai_identifier_str oai:run.unl.pt:10362/152939
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Optimizing digital archiving: An artificial intelligence approach for OCR error correctionOptical Character RecognitionMachine TranslationNeural NetworksProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis thesis research scopes the knowledge gap for effective ways to address OCR errors and the importance to have training datasets adequated size and quality, to promote digital documents OCR recognition efficiency. The main goal is to examine the effects regarding the following dimensions of sourcing data: input size vs performance vs time efficiency, and to propose a new design that includes a machine translation model, to automate the errors correction caused by OCR scan. The study implemented various LSTM, with different thresholds, to recover errors generated by OCR systems. However, the results did not overcomed the performance of existing OCR systems, due to dataset size limitations, a step further was achieved. A relationship between performance and input size was established, providing meaningful insights for future digital archiving systems optimisation. This dissertation creates a new approach, to deal with OCR problems and implementation considerations, that can be further followed, to optimise digital archive systems efficiency and results.Henriques, Roberto André PereiraRUNFernandes, Bruno Daniel Alho2024-04-13T00:32:15Z2023-04-132023-04-13T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/152939TID:203273451enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:11:34Zoai:run.unl.pt:10362/152939Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:42:04.952743Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Optimizing digital archiving: An artificial intelligence approach for OCR error correction
title Optimizing digital archiving: An artificial intelligence approach for OCR error correction
spellingShingle Optimizing digital archiving: An artificial intelligence approach for OCR error correction
Fernandes, Bruno Daniel Alho
Optical Character Recognition
Machine Translation
Neural Networks
title_short Optimizing digital archiving: An artificial intelligence approach for OCR error correction
title_full Optimizing digital archiving: An artificial intelligence approach for OCR error correction
title_fullStr Optimizing digital archiving: An artificial intelligence approach for OCR error correction
title_full_unstemmed Optimizing digital archiving: An artificial intelligence approach for OCR error correction
title_sort Optimizing digital archiving: An artificial intelligence approach for OCR error correction
author Fernandes, Bruno Daniel Alho
author_facet Fernandes, Bruno Daniel Alho
author_role author
dc.contributor.none.fl_str_mv Henriques, Roberto André Pereira
RUN
dc.contributor.author.fl_str_mv Fernandes, Bruno Daniel Alho
dc.subject.por.fl_str_mv Optical Character Recognition
Machine Translation
Neural Networks
topic Optical Character Recognition
Machine Translation
Neural Networks
description Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
publishDate 2023
dc.date.none.fl_str_mv 2023-04-13
2023-04-13T00:00:00Z
2024-04-13T00:32:15Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/152939
TID:203273451
url http://hdl.handle.net/10362/152939
identifier_str_mv TID:203273451
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833596902750289920