Borealis AI researchers propose PUMA: Performance Unchanged Model Augmentation for Training Data Removal

This Article is written as a summay by Marktechpost Staff based on the Research Paper 'PUMA: Performance Unchanged Model Augmentation for Training Data Removal'. All Credit For This Research Goes To The Researchers of This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit

As the protection of personal data becomes increasingly important in many countries and territories, the relevant protection regulations1 allow individuals to withdraw their consent to the use of their data for data analysis and machine learning (ML) model training.

While retraining ML models by eliminating marked data points is a viable method, frequent data removal requests inevitably impose a huge computational load on real-time ML infrastructures. In addition, cumulative data loss leads to rapid performance degradation. Therefore, removing the characteristic properties of data while maintaining model performance is a crucial and difficult research topic.

Recent research has tried to solve the problem of data removal. These methods, on the other hand, tend to focus on specific machine learning algorithms and are difficult to apply to deep neural networks, which currently dominate ML research and applications. Other researchers have proposed solutions such as assembling multiple ML models to create a data removal friendly model and explicitly computing the contribution of each training data point as an additive function to provide a single model solution. Unfortunately, such approaches are prohibitively expensive; Both managing a large number of submodels and tracking the model training process are impractical in real-world scenarios.

Borealis AI researchers recently proposed Performance Unchanged Model Augmentation (PUMA), a novel approach to quickly deleting the characteristic features of tagged data points from a trained model without sacrificing performance.

In particular, the proposed PUMA framework models the impact of each training data point on the model in relation to numerous performance criteria. The remaining data points are then carefully re-weighted, ideally using limited optimization, to compensate for the adverse effects of eliminating flagged data.

By linearly patching the original model through rebalancing while removing unique properties of tagged data points, PUMA is able to preserve model performance. Researchers compared PUMA to previous data removal algorithms in the studies and showed that PUMA can successfully trick a membership attack while resisting performance degradation.


Borealis AI researchers have released PUMA, a revolutionary data removal approach that removes distinctive features of labeled training data points from a trained ML model while preserving the model’s performance against a set of performance criteria. PUMA has a significant advantage over other approaches that require access to the model training process as it does not constrain the way the model is developed. Researchers discovered in a variety of studies that PUMA outperforms basic techniques in many ways, including effectiveness, efficiency and power maintenance capabilities.


Comments are closed.