Multi-Agent Reinforcement Learning

Kushagra Gupta, Utsav Singh

Apr 1, 2019 ml

I worked on multi-agent self-play in atari games in collaborative and competitive settings.
I used variational autoencoders to disentangle multiple near-optimal policies extracted using latent code.
Our initial results on the model gave win probability of 72%, which is close to 80% SOTA values, and much better than the human score of 40% in multi-agent CTF.
I worked on developing a generative model for InfoRL to maintain unsupervised setting for latent code generation to allow all standard MARL algorithms to be used with InfoRL.