Privacy-preserving machine learning on Apache Spark

Bibliographic Details
Main Author: Brito, Cláudia Vanessa Martins
Publication Date: 2023
Other Authors: Ferreira, Pedro G., Portela, Bernardo L., Oliveira, Rui Carlos Mendes de, Paulo, Joao T.
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/90761
Summary: The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.
id RCAP_379ba58752b58d3dccf88831b6b4fa1b
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/90761
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Privacy-preserving machine learning on Apache Sparkapache sparkdistributed systemsIntel SGXmachine learningPrivacy-preservingtrusted execution environmentsEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaThe adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.This work was supported by FCT - Portuguese Foundation for Science and Technology through the Ph.D. grant DFA/BD/146528/2018 and realized within the scope of the project LA/P/0063/2020.IEEEUniversidade do MinhoBrito, Cláudia Vanessa MartinsFerreira, Pedro G.Portela, Bernardo L.Oliveira, Rui Carlos Mendes dePaulo, Joao T.20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/90761engBrito, C. V., Ferreira, P. G., Portela, B. L., Oliveira, R. C., & Paulo, J. T. (2023). Privacy-Preserving Machine Learning on Apache Spark. IEEE Access. Institute of Electrical and Electronics Engineers (IEEE). http://doi.org/10.1109/access.2023.333222210.1109/ACCESS.2023.3332222https://ieeexplore.ieee.org/document/10314994info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T05:42:03Zoai:repositorium.sdum.uminho.pt:1822/90761Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:27:15.779151Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Privacy-preserving machine learning on Apache Spark
title Privacy-preserving machine learning on Apache Spark
spellingShingle Privacy-preserving machine learning on Apache Spark
Brito, Cláudia Vanessa Martins
apache spark
distributed systems
Intel SGX
machine learning
Privacy-preserving
trusted execution environments
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Privacy-preserving machine learning on Apache Spark
title_full Privacy-preserving machine learning on Apache Spark
title_fullStr Privacy-preserving machine learning on Apache Spark
title_full_unstemmed Privacy-preserving machine learning on Apache Spark
title_sort Privacy-preserving machine learning on Apache Spark
author Brito, Cláudia Vanessa Martins
author_facet Brito, Cláudia Vanessa Martins
Ferreira, Pedro G.
Portela, Bernardo L.
Oliveira, Rui Carlos Mendes de
Paulo, Joao T.
author_role author
author2 Ferreira, Pedro G.
Portela, Bernardo L.
Oliveira, Rui Carlos Mendes de
Paulo, Joao T.
author2_role author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Brito, Cláudia Vanessa Martins
Ferreira, Pedro G.
Portela, Bernardo L.
Oliveira, Rui Carlos Mendes de
Paulo, Joao T.
dc.subject.por.fl_str_mv apache spark
distributed systems
Intel SGX
machine learning
Privacy-preserving
trusted execution environments
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic apache spark
distributed systems
Intel SGX
machine learning
Privacy-preserving
trusted execution environments
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.
publishDate 2023
dc.date.none.fl_str_mv 2023
2023-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/90761
url https://hdl.handle.net/1822/90761
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Brito, C. V., Ferreira, P. G., Portela, B. L., Oliveira, R. C., & Paulo, J. T. (2023). Privacy-Preserving Machine Learning on Apache Spark. IEEE Access. Institute of Electrical and Electronics Engineers (IEEE). http://doi.org/10.1109/access.2023.3332222
10.1109/ACCESS.2023.3332222
https://ieeexplore.ieee.org/document/10314994
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv IEEE
publisher.none.fl_str_mv IEEE
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595324262776832