USC at ICML ’22 Conference – USC Viterbi

0

The International Conference on Machine Learning (ICML), a haven for machine learning innovation. Photo/iStock.

USC students and faculty will present their latest research at the July 17-21 International Conference on Machine Learning (ICML), a haven for machine learning innovation and the premier academic conference in the field. Each year, the conference brings in papers from industry and academic researchers on this particular branch of artificial intelligence, covering its potential applications in everything from biology to robotics.

This year’s event in Baltimore marks the 39thth ICML and will feature nine articles co-authored by USC students or professors in collaboration with companies such as Google, Amazon and Facebook. The papers cover a wide range of topics such as game theory, language learning and neural network mapping.

We asked the authors to summarize their research and its potential impact. (Responses have been edited for clarity.)

Kernelized multiplicative weights for 0/1 polyhedron games: bridging the gap between learning in extensive form and normal form games

Gabriele Farina (Carnegie Mellon University), Chung-Wei Lee (University of Southern California), Haipeng Luo (University of Southern California), Christian Kroer (Columbia University)

We consider solving Extensive Form Games (EFG), a general framework that models many strategy games, including card games like Texas Hold’em or Blackjack and board games like Monopoly or Chess. The main application is to develop better AI by playing these games. In this paper, we propose Kernelized Optimistic Multiplicative Weights Update (KOMWU), the first algorithm to simultaneously enjoy important theoretical guarantees, including better convergence, less dependence on game size, and near-optimal ‘regret’.” Chung Wei.

Learning the Markov Decision Process with Infinite Horizon and Average Reward with Constraints

Liyu Chen (University of Southern California) Rahul Jain (University of Southern California) Haipeng Luo (University of Southern California)

“This paper examines how to learn a policy that maximizes long-term average reward while satisfying some reinforcement learning (RL) constraints. For example, in logistics management, you want to minimize transportation costs while obeying traffic rules and meeting all deadlines. We propose a new algorithm that achieves better learning performance (as measured by a term called regret) compared to existing work. We are also the first to study a more general environment called weakly communicating assumptions in this direction and propose the first set of algorithms for this more general environment.” Liyu Chen

Improved no-regret algorithms for stochastic shortest paths with linear MDP

Liyu Chen (University of Southern California) Rahul Jain (University of Southern California) Haipeng Luo (University of Southern California)

“This paper examines how purposeful tasks such as car navigation or robot manipulation can be solved with reinforcement learning (RL) when some kind of linear structure is imposed on the environment. The goal is to use this structure to enable learning. Studying this environment is an important step in understanding reinforcement learning with function approximation (e.g. deep neural networks). We propose three algorithms in this direction. The first algorithm achieves cutting-edge learning performance (as measured by a term called regret) and is computationally efficient. The second and third algorithms give other forms of remorse guarantee, which is desirable for some tasks.” Liyu Chen

A rigorous study of the integrated gradient method and extensions of neuron internal attributes

Daniel Lundstrom (University of Southern California) Tianjian Huang (University of Southern California) Meisam Razaviyayn (University of Southern California, ISE)

Deep neural networks are very powerful tools for prediction. For example, they can help doctors read medical scans or allow self-driving cars to interpret what their exterior cameras see. The internal workings of these models are so complex that professionals have difficulty explaining them, and various tools have been developed to explain how neural networks work. Our article is an in-depth analysis of a popular method, Integrated Gradients, which is an explanatory model that claims to be the only method to satisfy a desirable set of properties.

We show that establishing the uniqueness of integrated gradients is more difficult than previously thought, and work towards justifying this by introducing another key property and then proving key results using this property. We also introduce an algorithm to help experts interpret the role of internal components or neurons. With this algorithm, experts could understand which parts of the model react to a wheel when the model identifies an image of a car, for example.” Daniel Lundström

No-regret learning in time-varying zero-sum games

Mengxiao Zhang (University of Southern California) · Peng Zhao (University of Nanjing) · Haipeng Luo (University of Southern California) · Zhi-Hua Zhou (University of Nanjing

Learning through repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. In practice, however, the game is more often not fixed, but changes over time due to the environment change and the players’ strategy change. Motivated by this, we’ll focus on a natural but little-explored variant of this problem, where the game’s payoff matrix changes over time, possibly in an adversarial way.

We first discuss what are the appropriate measures of performance for learning in non-stationary games and propose three natural and reasonable approaches to this problem. Then we design a new parameter-free algorithm that simultaneously enjoys favorable guarantees under the three different performance measures. These guarantees adapt to different non-stationarity measures of the payoff matrices and, importantly, recover the best known results when the payoff matrix is ​​fixed. Empirical results further confirm the effectiveness of our algorithm.” Mengxiao Zhang

UniREx: A unified learning framework for language model extraction

Aaron Chan (University of Southern California) Maziar Sanjabi (Meta AI) Lambert Mathias (Facebook) Liang Tan (Facebook) Shaoliang Nie (Facebook) Xiaochang Peng Xiang Ren (University of Southern California) Hamed Firooz (Facebook )

“Neural language models (NLMs), which make complex decisions based on natural language text, are the backbone of many modern AI systems. Nonetheless, the thought processes of NLMs are notoriously opaque, making it difficult to explain NLMs’ decisions to people. This lack of explainability also makes it harder for humans to debug AI systems when they are behaving problematically. To address this issue, our ICML paper proposes UNIREX, a unified framework for data-driven justification extraction that explains an NLM’s decision for a given input text by highlighting the words that most influenced the decision.

Our extensive empirical studies show that UNIREX far outperforms other rational extraction methods in balancing fidelity, plausibility, and task performance. Surprisingly, in real-world scenarios with limited labeled data, UNIREX is still effective and able to achieve high explainability when trained with a very small number of annotated justifications. Furthermore, the explainability of the UNIREX rationale extractors can even be generalized to datasets and tasks that are completely invisible during training!” Xiang Ren

Independent guideline gradient for large-scale potential Markov games: Sharper rates, function approximation, and game-independent convergence

Dongsheng Ding (University of Southern California) Chen-Yu Wei (University of Southern California) Mihailo Jovanovic (University of Southern California) Kaiqing Zhang (MIT)

“Can many independent agents learn good policies? This is an interesting question for real-world systems with multiple agents, from players in video games and robots in surveillance to bidders in real-time bidding. Browsing policies by multiple agents in tandem using reinforcement learning (RL) techniques has achieved great empirical achievement in playing video games, such as StarCraft. However, it is crucial to scale existing RL methods in the number of agents and the size of the state space, since they are enormous for real multi-agent systems.

We have developed a simple and natural method that solves a major problem in multiagent RL. Regardless of the number of agents and the size of the state space, agents can short-sightedly maximize their private rewards by independently searching for better policies without communicating with each other. This significantly advances the state of the art of RL for multi-agent systems, more generally the field of cooperative AI. Aside from being independent from other agents, we found that agents can learn good policies without knowing the type of games being played. This makes our method easy to use in both cooperative and competitive AI systems.” Dongsheng-Ding

Personalization improves the tradeoffs between privacy and accuracy in federated optimization

Alberto Bietti (NYU) Chen-Yu Wei (University of Southern California) Miro Dudik (Microsoft Research) John Langford (Microsoft Research) Steven Wu (Carnegie Mellon University)

“We often rely on recommendation systems to make decisions. For example, to help us choose restaurants, movies, music, news, shopping and more. The recommender systems need to collect feedback from users and create a collective model. Since each user has their own preferences, the system may need a “personalized model”. While the primary goal is to provide good recommendations, such a system is subject to privacy restrictions. That said, users may not want to share their precise data, such as location and transaction information. Of course, the less data users want to share, the less accurate the model is. How can we build a good system under such privacy restrictions?

This is a relevant question in the field of Personalized Federated Learning. We propose a system structure that uses personalized models to respect users’ privacy needs. The personalized model can be kept on the user side, and training it will not cause privacy leakage. On the other hand, the accuracy of the global model is highly dependent on how much data users choose to share. Therefore, to find a balance between privacy and accuracy, we control the relative weighting between the global model and the personalized models. We show in theory and experiments that by properly tuning the relative learning rates between the global and personalized models, the system can achieve better accuracy under a tight privacy constraint.” Chen-Yu Wei

Released on July 21, 2022

Last updated on July 21, 2022

Share.

Comments are closed.