abhshkdz at gatech dot edu
- [Nov 2019] Organizing the Visual Question Answering and Dialog workshop at CVPR 2020.
- [Sep 2019] Organizing the Visually-Grounded Interaction & Language Workshop at NeurIPS.
- [Jun 2019] Presenting Targeted Multi-Agent Communication as an oral at ICML 2019 (Video).
- [Mar 2019] Co-founded Caliper. Caliper helps recruiters evaluate practical AI skills.
- [Feb 2019] My work was featured in this wonderful article by Georgia Tech.
- [Jan 2019] Awarded the Facebook Graduate Fellowship.
- [Jan 2019] Awarded the Microsoft Research PhD Fellowship (declined).
- [Jan 2019] Awarded the NVIDIA Graduate Fellowship (declined).
- [Jan 2019] Organizing the 2nd Visual Dialog Challenge!
- [Oct 2018] Presenting Neural Modular Control for Embodied Question Answering as a spotlight at CoRL 2018 (Video).
- [Sep 2018] Presenting results and analysis of the 1st Visual Dialog Challenge at ECCV 2018.
- [Jul 2018] Presenting a tutorial on Connecting Language and Vision to Actions at ACL 2018.
- [Jun 2018] Organizing the 1st Visual Dialog Challenge!
- [Jun 2018] Presenting Embodied Question Answering as an oral at CVPR 2018 (Video).
- [Jun 2018] Organizing the VQA Challenge and Visual Dialog Workshop at CVPR 2018.
- [Mar 2018] Speaking on Embodied Question Answering at NVIDIA GTC (Video).
- [Dec 2017] Awarded the Adobe Research Fellowship. (Department’s news story)
- [Dec 2017] Awarded the Snap Inc. Research Fellowship. (Department’s news story)
- [Oct 2017] Presenting our paper on Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning as an oral at ICCV 2017 (Video).
- [Jul 2017] Speaking about our work on Visual Dialog at the Visual Question Answering Challenge Workshop, CVPR 2017 (Video).
- [Jul 2017] Presenting our paper on Visual Dialog as a spotlight at CVPR 2017 (Video).
I am a 4th year Computer Science PhD student at Georgia Tech, advised by Dhruv Batra, and working closely with Devi Parikh. My research focuses on deep learning and its applications in building agents that can see (computer vision), think (reasoning/interpretability), talk (language modeling), and act (reinforcement learning).
I’ve spent three wonderful semesters as an intern at Facebook AI Research — Summer 2017 and Spring 2018 at Menlo Park, working with Georgia Gkioxari, Devi Parikh and Dhruv Batra on training embodied agents for navigation and question-answering in simulated environments (see embodiedqa.org), and Summer 2018 at Montréal, working with Mike Rabbat and Joelle Pineau on emergent communication protocols in large-scale multi-agent reinforcement learning.
In 2019, I was fortunate to get the opportunity to spend time at DeepMind in London working on grounded language learning with Felix Hill, Laura Rimell, and Stephen Clark, and at Tesla Autopilot in Palo Alto working on differentiable neural architecture search with Andrej Karpathy.
I graduated from Indian Institute of Technology Roorkee in 2015. During my undergrad years, I’ve been selected twice for Google Summer of Code (2013 and 2014), won several hackathons and security contests (Yahoo! HackU!, Microsoft Code.Fun.Do., Deloitte CCTC 2013 and 2014), and been an active member of SDSLabs.
On the side, I built neural-vqa, an efficient Torch implementation for visual question answering (and its extension neural-vqa-attention), and maintain aideadlin.es (countdowns to a bunch of CV/NLP/ML/AI conference deadlines), and several other side projects (HackFlowy, graf, etc). I also help maintain Erdős, a competitive math learning platform I created during my undergrad. I often tweet, toot, and post pictures from my travels on Instagram and Tumblr.
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Improving Generative Visual Dialog by Answering Diverse Questions
Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning
ICLR 2019 Task-Agnostic RL Workshop
TarMAC: Targeted Multi-Agent Communication
Embodied Question Answering in Photorealistic Environments with Point Clouds
CVPR 2019 (Oral)
Audio-Visual Scene-Aware Dialog
End-to-end Audio Visual Scene-Aware Dialog Using Multimodal Attention-based Video Features
Neural Modular Control for Embodied Question Answering
CoRL 2018 (Spotlight)
Embodied Question Answering
CVPR 2018 (Oral)
Evaluating Visual Conversational Agents via Cooperative Human-AI Games
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
ICCV 2017 (Oral)
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
IJCV 2019, ICCV 2017, NIPS 2016 Interpretable ML for Complex Systems Workshop
PAMI 2018, CVPR 2017 (Spotlight)
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
CVIU 2017, EMNLP 2016, ICML 2016 Workshop on Visualization for Deep Learning
AirMaps was a fun hackathon project that lets users navigate through Google Earth with gestures and speech commands using a Kinect sensor. It was the winning entry in Microsoft Code.Fun.Do.
Another fun hackathon-winning project built during Yahoo! HackU! 2012 that involves webRTC-based P2P video chat, and was faster than any other video chat provider (at the time, before Google launched Hangouts).
Ugly-looking, but super-effective bash script for downloading entire playlists from 8tracks. (Still works as of 10/2016).