Researchers at Amazon propose “AdaMix”, an adaptive differential private algorithm for training deep neural network classifiers using both private and public image data


It is critical to maintain privacy by limiting the amount of data that can be collected about each training sample when training a deep neural network for visual classification. Differential Privacy (DP) is a theoretical framework that aims to provide strong guarantees about most data an attacker might be able to obtain about a given training example. A privacy parameter that often depends on the application context, is a way that DP specifically allows users to choose the desired trade-off between privacy and accuracy.

It is difficult to train large machine learning models while ensuring that each sample has a high level of privacy. In reality, however, one often has access to a data pool for which there are no data protection concerns. This could be a fake record or a record created for public use. This public data is separated from private data, the confidentiality of which is strongly maintained. The development of language models that satisfy DP on the target task while maintaining near-state-of-the-art performance has recently been made possible particularly by using large amounts of general public data for pre-training.

Avoiding the use of private data altogether is a definitive approach to privacy protection, and recent research has suggested numerous strategies for doing so. For example, by using zero-shot learning, one can train a visual model with public data from another modality (e.g. text) without ever seeing the private data. More generally, to practice learning with few shots, one can obtain or create some examples of labeled public data from the task distribution, while avoiding the use of private data at the same time.

Ignoring the latter is not a desirable privacy technique, since a slight domain switch can occur between the public and private data. This raises the question of how both private data and modest pieces of public data can be used to overcome the trade-off between accuracy and privacy.

To do this, Amazon researchers recently changed the environment of a study from most papers to DP to tag public data sources with the same labels as the objective task. This setting is known as MixDP or Mixed Differential Privacy. The researchers proposed to use the public data to create a low-shot or zero-shot learning classifier for the target tasks to overcome MixDP before private fine-tuning.

Compared to training with only private or only public data, even with a modest amount of the latter, the researchers showed that it was possible to achieve significant gains in the MixDP environment. To achieve this, they adapted pre-existing DP training algorithms to the mixed environment, leading to the development of AdaMix, a method for MixDP that uses public data to perform all key phases of private training, particularly model initialization and gradient clipping, tuning and fitting, and projection onto a low-dimensional subspace.

The long tails of the data are crucial for strong classification performance in visual classification tasks. Outliers or long tails have a significant impact on DP as it is a worst case framework. MixDP solves the problem by allowing open data collection to ensure each subpopulation is adequately covered. The convergence of the algorithm has been demonstrated by researchers along with a new, stronger constraint for the highly convex situation. This, too, allowed the team to compare the algorithm’s usefulness with that of its non-private equivalent.


Different Private models often perform worse than non-private models in computer vision for realistic privacy parameters. Using AdaMix in the MixDP learning environment, researchers at Amazon demonstrated that accurate networks can be built that achieve better accuracies than fully private training, without compromising data confidentiality, given a modest amount of public data. The team hopes that more extensive studies in this area will help introduce private models into more computer vision applications.

This Article is written as a summary article by Marktechpost Staff based on the paper 'Mixed Differential Privacy in Computer Vision'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper.

Please Don't Forget To Join Our ML Subreddit

Comments are closed.