No Cover Image

E-Thesis 409 views

Gradient Leakage and Protection for Federated Learning / HANCHI REN

Swansea University Author: HANCHI REN

Full text not available from this repository: check for access using links below.

DOI (Published version): 10.23889/SUthesis.63648

Abstract

In recent years, data privacy has become a critical issue in the field of Machine Learning (ML),given the significant amount of sensitive data involved in training and inference processes.Several approaches have been developed to address this challenge, including cryptographyand collaborative traini...

Full description

Published: Swansea, Wales, UK 2023
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Xie, Xianghua.
URI: https://cronfa.swan.ac.uk/Record/cronfa63648
first_indexed 2023-06-15T08:29:35Z
last_indexed 2024-11-15T18:02:01Z
id cronfa63648
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2023-10-03T15:56:47.2438373</datestamp><bib-version>v2</bib-version><id>63648</id><entry>2023-06-15</entry><title>Gradient Leakage and Protection for Federated Learning</title><swanseaauthors><author><sid>cb765a792db8eebb5362902aaeafa24d</sid><firstname>HANCHI</firstname><surname>REN</surname><name>HANCHI REN</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-06-15</date><abstract>In recent years, data privacy has become a critical issue in the field of Machine Learning (ML),given the significant amount of sensitive data involved in training and inference processes.Several approaches have been developed to address this challenge, including cryptographyand collaborative training. Cryptography techniques, such as Homomorphic Encryption (HE)and Differential Privacy (DP), have gained popularity due to their ability to protect sensitivedata during computation. HE allows computations to be performed directly on encrypted datawithout the need to decrypt it, thus ensuring privacy while still providing accurate results. Onthe other hand, DP adds random noise to data to protect individuals&#x2019; privacy while preservingstatistical accuracy. Collaborative training methods, such as Secure Multi-Party Computation(MPC), Distributed Learning, and Federated Learning (FL), aim to address privacy concernsby enabling secure local computation. In MPC, parties collaborate to compute a functionwithout revealing their inputs to each other, making it suitable for privacy-preserving ML tasks.Distributed Learning allows data to be distributed across multiple devices or nodes, reducingthe risk of data breaches while still achieving accurate results. FL enables the training of MLmodels on decentralised data without transferring raw data to a central location. While thesetechniques have proven effective in protecting sensitive data, they also have some limitations.For instance, HE and DP may be computationally expensive, which can hinder their widespreadadoption. Additionally, collaborative training methods may require significant communicationoverhead and synchronisation, which can affect training efficiency.Collaborative training through gradient exchange has been widely used in Deep Learning(DL) as a secure way to train a robust model. However, recent research has shown that thismethod may not be entirely secure. In fact, sensitive information can be recovered fromthe shared gradient, compromising privacy and leading to malicious actors&#x2019; potential theft ofvaluable data. Various studies have demonstrated that the publicly shared gradient can revealsensitive information about the training data, such as the presence of specific individuals orproperties. This can lead to significant privacy breaches, especially in sensitive areas such as healthcare or finance. As the demand for privacy-preserving ML grows, there is a need forfurther research and development of effective and robust techniques to ensure data privacyduring collaborative training.This thesis aims to investigate how to reconstruct private input data from the publiclyshared gradient and how to prevent gradient leakage in terms of gradient-sharing protocol anda private key-lock module. We first show that in an FL system, image-based privacy datacan be easily retrieved from the shared gradient through our proposed Generative RegressionNeural Network (GRNN). Our attack involves formulating the problem as a regression taskand optimising two branches of the generative model by minimising the gradient distance.The findings of our study demonstrate that even seemingly innocuous shared information canlead to the recovery of sensitive data. This highlights the importance of developing robustprivacy-preserving techniques to protect sensitive information during collaborative ML. Ourproposed GRNN attack serves as a wake-up call to the ML community to address the privacyconcerns associated with FL.Our following study found that the generalisation ability of joint models in FL is pooron Non-Independent and Non-Identically Distributed (Non-IID) data, particularly when theFederated Averaging (FedAvg) strategy is used, leading to weight divergence. To address thisissue, we propose a novel boosting algorithm for FL that addresses the generalisation andgradient leakage problems, resulting in faster convergence in gradient-based optimisation. Ourproposed boosting algorithm aims to improve the performance of FL models by aggregatingmodels trained on subsets of data, addressing the weight divergence issue. The algorithmleverages an adaptive weighting strategy, where the weights of each model are adjusted basedon their performance, with models that perform better receiving more weight. Additionally, weintroduce a privacy-preserving component to the algorithm, where local models are encrypted toreduce the risk of gradient leakage. Our proposed boosting algorithm shows promising resultsin addressing FL&#x2019;s generalisation and gradient leakage issues, leading to faster convergence ingradient-based optimisation. The findings of our study highlight the importance of developingrobust techniques to improve the performance of FL models and ensure data privacy duringcollaborative ML.At last, our research proposes a new approach to defending against gradient leakage attacksin FL through a private key-lock module (FedKL). This method involves securing arbitrary modelarchitectures with a private key-lock module, where only the locked gradient is transferredto the parameter server for aggregating the global model. The proposed FedKL method isdesigned to be robust against gradient leakage attacks, ensuring that sensitive informationcannot be reconstructed from the shared gradient. The key-lock module is trained in a waythat, without the private information of the module, it becomes infeasible to reconstruct trainingdata from the shared gradient. Furthermore, the inference performance of the global model issignificantly undermined without the key-lock module, making it an integral part of the modelarchitecture. Our theoretical analysis explains why the gradient can leak private informationand how the proposed FedKL method defends against the attack based on our analysis. Theproposed FedKL method provides a new perspective on defending against gradient leakageattacks in FL, enhancing the security and privacy of sensitive data. We will continuously work on the privacy-preserving FL. In our previous work, we have identified a number of follow-up research point. Examples include gradient leakage for Natural Language Processing (NLP), an adaptive gradient aggregation method and partial gradient leakage. Since we have theoretically proven that the private information is carried by the gradients, so finding the state-of-the-art methods of stealing data and defending against leakage is a long-term study in safeguarding privacy.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Deep Learning, Federated Learning, Gradient Leakage</keywords><publishedDay>12</publishedDay><publishedMonth>5</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-05-12</publishedDate><doi>10.23889/SUthesis.63648</doi><url/><notes>Due to Embargo and/or Third Party Copyright restrictions, this thesis is not available via this service.</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Xie, Xianghua.</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><apcterm/><funders/><projectreference/><lastEdited>2023-10-03T15:56:47.2438373</lastEdited><Created>2023-06-15T09:26:34.1349752</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>HANCHI</firstname><surname>REN</surname><order>1</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling 2023-10-03T15:56:47.2438373 v2 63648 2023-06-15 Gradient Leakage and Protection for Federated Learning cb765a792db8eebb5362902aaeafa24d HANCHI REN HANCHI REN true false 2023-06-15 In recent years, data privacy has become a critical issue in the field of Machine Learning (ML),given the significant amount of sensitive data involved in training and inference processes.Several approaches have been developed to address this challenge, including cryptographyand collaborative training. Cryptography techniques, such as Homomorphic Encryption (HE)and Differential Privacy (DP), have gained popularity due to their ability to protect sensitivedata during computation. HE allows computations to be performed directly on encrypted datawithout the need to decrypt it, thus ensuring privacy while still providing accurate results. Onthe other hand, DP adds random noise to data to protect individuals’ privacy while preservingstatistical accuracy. Collaborative training methods, such as Secure Multi-Party Computation(MPC), Distributed Learning, and Federated Learning (FL), aim to address privacy concernsby enabling secure local computation. In MPC, parties collaborate to compute a functionwithout revealing their inputs to each other, making it suitable for privacy-preserving ML tasks.Distributed Learning allows data to be distributed across multiple devices or nodes, reducingthe risk of data breaches while still achieving accurate results. FL enables the training of MLmodels on decentralised data without transferring raw data to a central location. While thesetechniques have proven effective in protecting sensitive data, they also have some limitations.For instance, HE and DP may be computationally expensive, which can hinder their widespreadadoption. Additionally, collaborative training methods may require significant communicationoverhead and synchronisation, which can affect training efficiency.Collaborative training through gradient exchange has been widely used in Deep Learning(DL) as a secure way to train a robust model. However, recent research has shown that thismethod may not be entirely secure. In fact, sensitive information can be recovered fromthe shared gradient, compromising privacy and leading to malicious actors’ potential theft ofvaluable data. Various studies have demonstrated that the publicly shared gradient can revealsensitive information about the training data, such as the presence of specific individuals orproperties. This can lead to significant privacy breaches, especially in sensitive areas such as healthcare or finance. As the demand for privacy-preserving ML grows, there is a need forfurther research and development of effective and robust techniques to ensure data privacyduring collaborative training.This thesis aims to investigate how to reconstruct private input data from the publiclyshared gradient and how to prevent gradient leakage in terms of gradient-sharing protocol anda private key-lock module. We first show that in an FL system, image-based privacy datacan be easily retrieved from the shared gradient through our proposed Generative RegressionNeural Network (GRNN). Our attack involves formulating the problem as a regression taskand optimising two branches of the generative model by minimising the gradient distance.The findings of our study demonstrate that even seemingly innocuous shared information canlead to the recovery of sensitive data. This highlights the importance of developing robustprivacy-preserving techniques to protect sensitive information during collaborative ML. Ourproposed GRNN attack serves as a wake-up call to the ML community to address the privacyconcerns associated with FL.Our following study found that the generalisation ability of joint models in FL is pooron Non-Independent and Non-Identically Distributed (Non-IID) data, particularly when theFederated Averaging (FedAvg) strategy is used, leading to weight divergence. To address thisissue, we propose a novel boosting algorithm for FL that addresses the generalisation andgradient leakage problems, resulting in faster convergence in gradient-based optimisation. Ourproposed boosting algorithm aims to improve the performance of FL models by aggregatingmodels trained on subsets of data, addressing the weight divergence issue. The algorithmleverages an adaptive weighting strategy, where the weights of each model are adjusted basedon their performance, with models that perform better receiving more weight. Additionally, weintroduce a privacy-preserving component to the algorithm, where local models are encrypted toreduce the risk of gradient leakage. Our proposed boosting algorithm shows promising resultsin addressing FL’s generalisation and gradient leakage issues, leading to faster convergence ingradient-based optimisation. The findings of our study highlight the importance of developingrobust techniques to improve the performance of FL models and ensure data privacy duringcollaborative ML.At last, our research proposes a new approach to defending against gradient leakage attacksin FL through a private key-lock module (FedKL). This method involves securing arbitrary modelarchitectures with a private key-lock module, where only the locked gradient is transferredto the parameter server for aggregating the global model. The proposed FedKL method isdesigned to be robust against gradient leakage attacks, ensuring that sensitive informationcannot be reconstructed from the shared gradient. The key-lock module is trained in a waythat, without the private information of the module, it becomes infeasible to reconstruct trainingdata from the shared gradient. Furthermore, the inference performance of the global model issignificantly undermined without the key-lock module, making it an integral part of the modelarchitecture. Our theoretical analysis explains why the gradient can leak private informationand how the proposed FedKL method defends against the attack based on our analysis. Theproposed FedKL method provides a new perspective on defending against gradient leakageattacks in FL, enhancing the security and privacy of sensitive data. We will continuously work on the privacy-preserving FL. In our previous work, we have identified a number of follow-up research point. Examples include gradient leakage for Natural Language Processing (NLP), an adaptive gradient aggregation method and partial gradient leakage. Since we have theoretically proven that the private information is carried by the gradients, so finding the state-of-the-art methods of stealing data and defending against leakage is a long-term study in safeguarding privacy. E-Thesis Swansea, Wales, UK Deep Learning, Federated Learning, Gradient Leakage 12 5 2023 2023-05-12 10.23889/SUthesis.63648 Due to Embargo and/or Third Party Copyright restrictions, this thesis is not available via this service. COLLEGE NANME COLLEGE CODE Swansea University Xie, Xianghua. Doctoral Ph.D 2023-10-03T15:56:47.2438373 2023-06-15T09:26:34.1349752 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science HANCHI REN 1
title Gradient Leakage and Protection for Federated Learning
spellingShingle Gradient Leakage and Protection for Federated Learning
HANCHI REN
title_short Gradient Leakage and Protection for Federated Learning
title_full Gradient Leakage and Protection for Federated Learning
title_fullStr Gradient Leakage and Protection for Federated Learning
title_full_unstemmed Gradient Leakage and Protection for Federated Learning
title_sort Gradient Leakage and Protection for Federated Learning
author_id_str_mv cb765a792db8eebb5362902aaeafa24d
author_id_fullname_str_mv cb765a792db8eebb5362902aaeafa24d_***_HANCHI REN
author HANCHI REN
author2 HANCHI REN
format E-Thesis
publishDate 2023
institution Swansea University
doi_str_mv 10.23889/SUthesis.63648
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 0
active_str 0
description In recent years, data privacy has become a critical issue in the field of Machine Learning (ML),given the significant amount of sensitive data involved in training and inference processes.Several approaches have been developed to address this challenge, including cryptographyand collaborative training. Cryptography techniques, such as Homomorphic Encryption (HE)and Differential Privacy (DP), have gained popularity due to their ability to protect sensitivedata during computation. HE allows computations to be performed directly on encrypted datawithout the need to decrypt it, thus ensuring privacy while still providing accurate results. Onthe other hand, DP adds random noise to data to protect individuals’ privacy while preservingstatistical accuracy. Collaborative training methods, such as Secure Multi-Party Computation(MPC), Distributed Learning, and Federated Learning (FL), aim to address privacy concernsby enabling secure local computation. In MPC, parties collaborate to compute a functionwithout revealing their inputs to each other, making it suitable for privacy-preserving ML tasks.Distributed Learning allows data to be distributed across multiple devices or nodes, reducingthe risk of data breaches while still achieving accurate results. FL enables the training of MLmodels on decentralised data without transferring raw data to a central location. While thesetechniques have proven effective in protecting sensitive data, they also have some limitations.For instance, HE and DP may be computationally expensive, which can hinder their widespreadadoption. Additionally, collaborative training methods may require significant communicationoverhead and synchronisation, which can affect training efficiency.Collaborative training through gradient exchange has been widely used in Deep Learning(DL) as a secure way to train a robust model. However, recent research has shown that thismethod may not be entirely secure. In fact, sensitive information can be recovered fromthe shared gradient, compromising privacy and leading to malicious actors’ potential theft ofvaluable data. Various studies have demonstrated that the publicly shared gradient can revealsensitive information about the training data, such as the presence of specific individuals orproperties. This can lead to significant privacy breaches, especially in sensitive areas such as healthcare or finance. As the demand for privacy-preserving ML grows, there is a need forfurther research and development of effective and robust techniques to ensure data privacyduring collaborative training.This thesis aims to investigate how to reconstruct private input data from the publiclyshared gradient and how to prevent gradient leakage in terms of gradient-sharing protocol anda private key-lock module. We first show that in an FL system, image-based privacy datacan be easily retrieved from the shared gradient through our proposed Generative RegressionNeural Network (GRNN). Our attack involves formulating the problem as a regression taskand optimising two branches of the generative model by minimising the gradient distance.The findings of our study demonstrate that even seemingly innocuous shared information canlead to the recovery of sensitive data. This highlights the importance of developing robustprivacy-preserving techniques to protect sensitive information during collaborative ML. Ourproposed GRNN attack serves as a wake-up call to the ML community to address the privacyconcerns associated with FL.Our following study found that the generalisation ability of joint models in FL is pooron Non-Independent and Non-Identically Distributed (Non-IID) data, particularly when theFederated Averaging (FedAvg) strategy is used, leading to weight divergence. To address thisissue, we propose a novel boosting algorithm for FL that addresses the generalisation andgradient leakage problems, resulting in faster convergence in gradient-based optimisation. Ourproposed boosting algorithm aims to improve the performance of FL models by aggregatingmodels trained on subsets of data, addressing the weight divergence issue. The algorithmleverages an adaptive weighting strategy, where the weights of each model are adjusted basedon their performance, with models that perform better receiving more weight. Additionally, weintroduce a privacy-preserving component to the algorithm, where local models are encrypted toreduce the risk of gradient leakage. Our proposed boosting algorithm shows promising resultsin addressing FL’s generalisation and gradient leakage issues, leading to faster convergence ingradient-based optimisation. The findings of our study highlight the importance of developingrobust techniques to improve the performance of FL models and ensure data privacy duringcollaborative ML.At last, our research proposes a new approach to defending against gradient leakage attacksin FL through a private key-lock module (FedKL). This method involves securing arbitrary modelarchitectures with a private key-lock module, where only the locked gradient is transferredto the parameter server for aggregating the global model. The proposed FedKL method isdesigned to be robust against gradient leakage attacks, ensuring that sensitive informationcannot be reconstructed from the shared gradient. The key-lock module is trained in a waythat, without the private information of the module, it becomes infeasible to reconstruct trainingdata from the shared gradient. Furthermore, the inference performance of the global model issignificantly undermined without the key-lock module, making it an integral part of the modelarchitecture. Our theoretical analysis explains why the gradient can leak private informationand how the proposed FedKL method defends against the attack based on our analysis. Theproposed FedKL method provides a new perspective on defending against gradient leakageattacks in FL, enhancing the security and privacy of sensitive data. We will continuously work on the privacy-preserving FL. In our previous work, we have identified a number of follow-up research point. Examples include gradient leakage for Natural Language Processing (NLP), an adaptive gradient aggregation method and partial gradient leakage. Since we have theoretically proven that the private information is carried by the gradients, so finding the state-of-the-art methods of stealing data and defending against leakage is a long-term study in safeguarding privacy.
published_date 2023-05-12T20:22:51Z
_version_ 1821347751116406784
score 11.04748