Privacy-preserving machine learning on Apache Spark

Brito, Cláudia Vanessa Martins; Ferreira, Pedro G.; Portela, Bernardo L.; Oliveira, Rui Carlos Mendes de; Paulo, Joao T.

Privacy-preserving machine learning on Apache Spark

Bibliographic Details
Main Author:	Brito, Cláudia Vanessa Martins
Publication Date:	2023
Other Authors:	Ferreira, Pedro G., Portela, Bernardo L., Oliveira, Rui Carlos Mendes de, Paulo, Joao T.
Format:	Article
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	https://hdl.handle.net/1822/90761
Summary:	The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

Item metadata

id	RCAP_379ba58752b58d3dccf88831b6b4fa1b
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/90761
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Privacy-preserving machine learning on Apache Sparkapache sparkdistributed systemsIntel SGXmachine learningPrivacy-preservingtrusted execution environmentsEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaThe adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.This work was supported by FCT - Portuguese Foundation for Science and Technology through the Ph.D. grant DFA/BD/146528/2018 and realized within the scope of the project LA/P/0063/2020.IEEEUniversidade do MinhoBrito, Cláudia Vanessa MartinsFerreira, Pedro G.Portela, Bernardo L.Oliveira, Rui Carlos Mendes dePaulo, Joao T.20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/90761engBrito, C. V., Ferreira, P. G., Portela, B. L., Oliveira, R. C., & Paulo, J. T. (2023). Privacy-Preserving Machine Learning on Apache Spark. IEEE Access. Institute of Electrical and Electronics Engineers (IEEE). http://doi.org/10.1109/access.2023.333222210.1109/ACCESS.2023.3332222https://ieeexplore.ieee.org/document/10314994info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T05:42:03Zoai:repositorium.sdum.uminho.pt:1822/90761Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:27:15.779151Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Privacy-preserving machine learning on Apache Spark
title	Privacy-preserving machine learning on Apache Spark
spellingShingle	Privacy-preserving machine learning on Apache Spark Brito, Cláudia Vanessa Martins apache spark distributed systems Intel SGX machine learning Privacy-preserving trusted execution environments Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short	Privacy-preserving machine learning on Apache Spark
title_full	Privacy-preserving machine learning on Apache Spark
title_fullStr	Privacy-preserving machine learning on Apache Spark
title_full_unstemmed	Privacy-preserving machine learning on Apache Spark
title_sort	Privacy-preserving machine learning on Apache Spark
author	Brito, Cláudia Vanessa Martins
author_facet	Brito, Cláudia Vanessa Martins Ferreira, Pedro G. Portela, Bernardo L. Oliveira, Rui Carlos Mendes de Paulo, Joao T.
author_role	author
author2	Ferreira, Pedro G. Portela, Bernardo L. Oliveira, Rui Carlos Mendes de Paulo, Joao T.
author2_role	author author author author
dc.contributor.none.fl_str_mv	Universidade do Minho
dc.contributor.author.fl_str_mv	Brito, Cláudia Vanessa Martins Ferreira, Pedro G. Portela, Bernardo L. Oliveira, Rui Carlos Mendes de Paulo, Joao T.
dc.subject.por.fl_str_mv	apache spark distributed systems Intel SGX machine learning Privacy-preserving trusted execution environments Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic	apache spark distributed systems Intel SGX machine learning Privacy-preserving trusted execution environments Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description	The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.
publishDate	2023
dc.date.none.fl_str_mv	2023 2023-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1822/90761
url	https://hdl.handle.net/1822/90761
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Brito, C. V., Ferreira, P. G., Portela, B. L., Oliveira, R. C., & Paulo, J. T. (2023). Privacy-Preserving Machine Learning on Apache Spark. IEEE Access. Institute of Electrical and Electronics Engineers (IEEE). http://doi.org/10.1109/access.2023.3332222 10.1109/ACCESS.2023.3332222 https://ieeexplore.ieee.org/document/10314994
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	IEEE
publisher.none.fl_str_mv	IEEE
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833595324262776832

Privacy-preserving machine learning on Apache Spark

Similar Items