Research Scientist
Facebook AI Research
abhshkdz at fb dot com



I am a Research Scientist at Facebook AI Research (FAIR). Previously, I was a Computer Science PhD student at Georgia Tech, advised by Dhruv Batra, and working closely with Devi Parikh.

My research focuses on deep learning and its applications in climate change, and in building agents that can see (computer vision), think (reasoning/interpretability), talk (language modeling), and act (reinforcement learning). My CV is available here.

IIT Roorkee
2011 - 2015
Queensland Brain Institute
Summer 2015
Virginia Tech
2015 - 2016
Georgia Tech
2017 - 2020
Facebook AI Research
S2017, W2018, S2018
Winter 2019
Tesla Autopilot
Summer 2019
Facebook AI Research

During my PhD, I’ve spent three wonderful semesters as an intern at Facebook AI Research — Summer 2017 and Spring 2018 at Menlo Park, working with Georgia Gkioxari, Devi Parikh and Dhruv Batra on training embodied agents for navigation and question-answering in simulated environments (see, and Summer 2018 at Montréal, working with Mike Rabbat and Joelle Pineau on communication protocols in large-scale multi-agent reinforcement learning.

In 2019, I was fortunate to get the opportunity to spend time at DeepMind in London working on grounded language learning with Felix Hill, Laura Rimell, and Stephen Clark, and at Tesla Autopilot in Palo Alto working on differentiable neural architecture search with Andrej Karpathy.

My PhD research was supported by fellowships from Facebook, Adobe, and Snap.

Prior to joining grad school, I worked on neural coding in zebrafish tectum as an intern under Prof. Geoffrey Goodhill and Lilach Avitan at the Goodhill Lab, Queensland Brain Institute.

I graduated from Indian Institute of Technology Roorkee in 2015. During my undergrad years, I’ve been selected twice for Google Summer of Code (2013 and 2014), won several hackathons and security contests (Yahoo! HackU!, Microsoft Code.Fun.Do., Deloitte CCTC 2013 and 2014), and been an active member of SDSLabs.

On the side, I maintain (countdowns to a bunch of CV/NLP/ML/AI conference deadlines) and (statistics of industry job offers in AI). Previously, I’d built neural-vqa and its extension neural-vqa-attention, HackFlowy, graf, Erdős, etc. I often tweet and post pictures from my travels on Instagram and Tumblr.

Blog posts from a previous life.


Rotation Invariant Graph Neural Networks using Spin Convolutions

Muhammed Shuaibi, Adeesh Kolluru, Abhishek Das, Aditya Grover, Anuroop Sriram, Zachary Ulissi, C. Lawrence Zitnick

Automated Video Description for Blind and Low Vision Users

Aditya Bodi, Pooyan Fazli, Shasta Ihorn, Yue-Ting Siu, Andrew T Scott, Lothar Narins, Yash Kant, Abhishek Das, Ilmi Yoon
CHI EA 2021

Auxiliary Tasks and Exploration Enable ObjectNav

Joel Ye, Dhruv Batra, Abhishek Das, Erik Wijmans
ICCV 2021
Paper Code Website

ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations

Weihua Hu, Muhammed Shuaibi, Abhishek Das, Siddharth Goyal, Anuroop Sriram, Jure Leskovec, Devi Parikh, C. Lawrence Zitnick

The Open Catalyst 2020 (OC20) Dataset and Community Challenges

Lowik Chanussot*, Abhishek Das*, Siddharth Goyal*, Thibaut Lavril*, Muhammed Shuaibi*, Morgane Riviére, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi
ACS Catalysis 2021
Paper Code

An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage

C. Lawrence Zitnick, Lowik Chanussot, Abhishek Das, Siddharth Goyal, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Thibaut Lavril, Aini Palizhati, Morgane Riviére, Muhammed Shuaibi, Anuroop Sriram, Kevin Tran, Brandon Wood, Junwoong Yoon, Devi Parikh, Zachary Ulissi

Auxiliary Tasks Speed Up Learning PointGoal Navigation

Joel Ye, Dhruv Batra, Erik Wijmans*, Abhishek Das*
CoRL 2020
Paper Code

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

Vishvak Murahari, Dhruv Batra, Devi Parikh, Abhishek Das
ECCV 2020
Paper Code

Probing Emergent Semantics in Predictive Agents via Question Answering

Abhishek Das*, Federico Carnevale*, Hamza Merzic, Laura Rimell, Rosalia Schneider, Josh Abramson, Alden Hung, Arun Ahuja, Stephen Clark, Gregory Wayne, Felix Hill
ICML 2020
Paper Presentation video Slides

Feel The Music: Automatically Generating A Dance For An Input Song

Purva Tendulkar, Abhishek Das, Aniruddha Kembhavi, Devi Parikh
ICCC 2020
Paper Code Videos

IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL

Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam
IJCAI-PRICAI 2020, ICLR 2019 Task-Agnostic RL Workshop

Improving Generative Visual Dialog by Answering Diverse Questions

Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das
EMNLP 2019
Paper Code

TarMAC: Targeted Multi-Agent Communication

Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau
ICML 2019
Paper Slides

Embodied Question Answering in Photorealistic Environments with Point Clouds

Erik Wijmans*, Samyak Datta*, Oleksandr Maksymets*, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
CVPR 2019 (Oral)

Audio-Visual Scene-Aware Dialog

Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Stefan Lee, Peter Anderson, Irfan Essa, Devi Parikh, Dhruv Batra, Anoop Cherian, Tim K. Marks, Chiori Hori
CVPR 2019
Paper Code

End-to-end Audio Visual Scene-Aware Dialog Using Multimodal Attention-based Video Features

Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh

Neural Modular Control for Embodied Question Answering

Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
CoRL 2018 (Spotlight)
Paper Presentation video Slides

Embodied Question Answering

Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
CVPR 2018 (Oral)
Paper Code Presentation video Slides

Evaluating Visual Conversational Agents via Cooperative Human-AI Games

Prithvijit Chattopadhyay*, Deshraj Yadav*, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh
HCOMP 2017
Paper Code

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

Abhishek Das*, Satwik Kottur*, Stefan Lee, José M.F. Moura, Dhruv Batra
ICCV 2017 (Oral)
Paper Code Presentation video Slides

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
IJCV 2019, ICCV 2017, NIPS 2016 Interpretable ML for Complex Systems Workshop
Paper Code Demo

Visual Dialog

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, Dhruv Batra
PAMI 2018, CVPR 2017 (Spotlight)
Paper Code AMT chat interface Demo Presentation video Slides

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

Abhishek Das*, Harsh Agrawal*, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra
CVIU 2017, EMNLP 2016, ICML 2016 Workshop on Visualization for Deep Learning
Paper Project+Dataset neural-vqa-attention


Side projects provides statistics of industry job offers in Artificial Intelligence (AI). All data is anonymous, cross-verified against offer letters and will hopefully reduce information asymmetry. is a webpage to keep track of CV/NLP/ML/AI conference deadlines. It's hosted on GitHub, and countdowns are automatically updated via pull requests to the data file in the repo.


Torch implementation of an attention-based visual question answering model (Yang et al., CVPR16). The model looks at an image, reads a question, and comes up with an answer to the question and a heatmap of where it looked in the image to answer it. Some results here.


neural-vqa is an efficient, GPU-based Torch implementation of the visual question answering model from the NIPS 2015 paper 'Exploring Models and Data for Image Question Answering' by Ren et al.


Erdős by SDSLabs is a competitive math learning platform, similar in spirit to Project Euler, albeit more feature-packed (support for holding competitions, has a social layer) and prettier.


graf plots pretty git contribution bar graphs in the terminal. gem install graf to install.


Clone of, a beautiful, list-based note-taking website that has a 500-item monthly limit on the free tier :-(. This project is an open-source clone of WorkFlowy. "Make lists. Not war." :-)


AirMaps was a fun hackathon project that lets users navigate through Google Earth with gestures and speech commands using a Kinect sensor. It was the winning entry in Microsoft Code.Fun.Do.


Another fun hackathon-winning project built during Yahoo! HackU! 2012 that involves webRTC-based P2P video chat, and was faster than any other video chat provider (at the time, before Google launched Hangouts).


Ugly-looking, but super-effective bash script for downloading entire playlists from 8tracks. (Still works as of 10/2016).