Optimizing digital archiving: An artificial intelligence approach for OCR error correction
Main Author: | |
---|---|
Publication Date: | 2023 |
Format: | Master thesis |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10362/152939 |
Summary: | Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
id |
RCAP_8f28e33044bfbca8123bd3efd91675ed |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/152939 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Optimizing digital archiving: An artificial intelligence approach for OCR error correctionOptical Character RecognitionMachine TranslationNeural NetworksProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis thesis research scopes the knowledge gap for effective ways to address OCR errors and the importance to have training datasets adequated size and quality, to promote digital documents OCR recognition efficiency. The main goal is to examine the effects regarding the following dimensions of sourcing data: input size vs performance vs time efficiency, and to propose a new design that includes a machine translation model, to automate the errors correction caused by OCR scan. The study implemented various LSTM, with different thresholds, to recover errors generated by OCR systems. However, the results did not overcomed the performance of existing OCR systems, due to dataset size limitations, a step further was achieved. A relationship between performance and input size was established, providing meaningful insights for future digital archiving systems optimisation. This dissertation creates a new approach, to deal with OCR problems and implementation considerations, that can be further followed, to optimise digital archive systems efficiency and results.Henriques, Roberto André PereiraRUNFernandes, Bruno Daniel Alho2024-04-13T00:32:15Z2023-04-132023-04-13T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/152939TID:203273451enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:11:34Zoai:run.unl.pt:10362/152939Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:42:04.952743Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
title |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
spellingShingle |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction Fernandes, Bruno Daniel Alho Optical Character Recognition Machine Translation Neural Networks |
title_short |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
title_full |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
title_fullStr |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
title_full_unstemmed |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
title_sort |
Optimizing digital archiving: An artificial intelligence approach for OCR error correction |
author |
Fernandes, Bruno Daniel Alho |
author_facet |
Fernandes, Bruno Daniel Alho |
author_role |
author |
dc.contributor.none.fl_str_mv |
Henriques, Roberto André Pereira RUN |
dc.contributor.author.fl_str_mv |
Fernandes, Bruno Daniel Alho |
dc.subject.por.fl_str_mv |
Optical Character Recognition Machine Translation Neural Networks |
topic |
Optical Character Recognition Machine Translation Neural Networks |
description |
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-04-13 2023-04-13T00:00:00Z 2024-04-13T00:32:15Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/152939 TID:203273451 |
url |
http://hdl.handle.net/10362/152939 |
identifier_str_mv |
TID:203273451 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833596902750289920 |