Multi-Agent Reinforcement Learning

  • I worked on multi-agent self-play in atari games in collaborative and competitive settings.
  • I used variational autoencoders to disentangle multiple near-optimal policies extracted using latent code.
  • Our initial results on the model gave win probability of 72%, which is close to 80% SOTA values, and much better than the human score of 40% in multi-agent CTF.
  • I worked on developing a generative model for InfoRL to maintain unsupervised setting for latent code generation to allow all standard MARL algorithms to be used with InfoRL.