Bio
I’m the Co-CEO and Co-Founder of Yutori. Almost everything in computing has been reinvented in the last thirty years, except how we interact with the web. It’s still a person, a browser, and endless clicking, scrolling, filling forms, fighting popups and ads. At Yutori, we’re building agents that browse and act on the web for you, autonomously.
During my PhD at Georgia Tech (2016–2020), I did some of the earliest work on agents that can see, talk, and act, e.g. visual chatbots trained with deep reinforcement learning, embodied agents that navigate and answer questions, and attention-based multi-agent communication. My labmates and I also developed Grad-CAM (42k+ citations), a general method for interpreting neural networks. Along the way, I interned at FAIR, DeepMind, and Tesla Autopilot.
My PhD thesis was a runner-up for the 2020 AAAI/ACM SIGAI Doctoral Dissertation Award.
I’ve also spent time at Fundamental AI Research (FAIR) at Meta, where I helped start the Open Catalyst Project (now FAIR Chemistry) to accelerate electrocatalyst discovery. My teammates and I developed large datasets like OC20 and OC22, and state-of-the-art models like GemNet-OC, EquiformerV2, and UMA, which have sped up DFT calculations by over 2000x.
I got my Bachelor’s at IIT Roorkee. On the side, I’ve built aideadlin.es, aipaygrad.es, and other things, and I occasionally dabble in generative art.
Talks and Interviews
Publications
My papers have been cited 52,473 times. See Google Scholar for an up-to-date list.
Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields
TMLR 2024
Paper

The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

AdsorbML: Accelerating Adsorption Energy Calculations with Machine Learning

PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav

The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysis

GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets

Open Challenges in Developing Generalizable Large Scale Machine Learning Models for Catalyst Discovery
ACS Catalysis (Perspective) 2022
Paper

Transfer learning using attentions across atomic systems with graph neural networks (TAAG)
The Journal of Chemical Physics 2022 Paper Code
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
CVPR 2022
Paper
Code
Website
Presentation video

Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations

Automated Video Description for Blind and Low Vision Users
CHI EA 2021
Paper

ForceNet: A Graph Neural Network for Large-Scale Quantum Calculations
ICLR 2021 Deep Learning for Simulation Workshop
Paper
opencatalystproject.org
Presentation video

The Open Catalyst 2020 (OC20) Dataset and Community Challenges
ACS Catalysis 2021
Paper
Code
Dataset
opencatalystproject.org

An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage
"Facebook and Carnegie Mellon launch .. to ... store renewable energy" by Larry Zitnick
"Facebook A.I. researchers push for a breakthrough in renewable energy storage" by Jeremy Kahn
"Facebook deploys its AI to find green energy storage solutions" by Andrew Tarantola
"Facebook to use artificial intelligence in bid to improve renewable energy storage" by Sam Shead
"Facebook and Carnegie Mellon launch project to ... store renewable energy" by Kyle Wiggers
"Facebook plans to use AI to help fight climate change" by Queenie Wong
"Facebook & CMU Open Catalyst Project Applies AI to Renewable Energy Storage" by Fangyu Cai
Building agents that can see, talk, and act
AAAI/ACM SIGAI Doctoral Dissertation Award, Runner-up Georgia Tech Sigma Xi Best PhD Thesis Award Georgia Tech College of Computing Dissertation Award PhD Thesis
Probing Emergent Semantics in Predictive Agents via Question Answering
ICML 2020
Paper
Presentation video
Slides

IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL
IJCAI-PRICAI 2020, ICLR 2019 Task-Agnostic RL Workshop
Paper

Embodied Question Answering in Photorealistic Environments with Point Clouds
CVPR 2019 (Oral)
Paper

Audio-Visual Scene-Aware Dialog
CVPR 2019
Paper
Code
video-dialog.com

End-to-end Audio Visual Scene-Aware Dialog Using Multimodal Attention-based Video Features
ICASSP 2019
Paper
video-dialog.com

Neural Modular Control for Embodied Question Answering
CoRL 2018 (Spotlight)
Paper
embodiedqa.org
Presentation video
Slides

Embodied Question Answering
CVPR 2018 (Oral)
Paper
embodiedqa.org
Code
Presentation video
Slides

"Embodied Question Answering" by Abhishek Das
"... a goal-driven approach to autonomous agents" by Dhruv Batra, Devi Parikh
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
ICCV 2017 (Oral)
Paper
Code
Presentation video
Slides

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
IJCV 2019, ICCV 2017, NIPS 2016 Interpretable ML for Complex Systems Workshop
Paper
Code
Demo

Visual Dialog
PAMI 2018, CVPR 2017 (Spotlight)
Paper
Code
visualdialog.org
AMT chat interface
Demo
Presentation video
Slides

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
CVIU 2017, EMNLP 2016, ICML 2016 Workshop on Visualization for Deep Learning Paper Project+Dataset neural-vqa-attention

"Is Artificial Intelligence Permanently Inscrutable?" by Aaron Bornstein
"Deep learning is creating computer systems we don't fully understand" by James Vincent
"Robot eyes and humans fix on different things to decode a scene" by Aviva Rutkin
"Robots and humans see the world differently – but we don't know why" by Duncan Geere
Side projects
aipaygrad.es
aipaygrad.es provides statistics of industry job offers in Artificial Intelligence (AI).
All data is anonymous, cross-verified against offer letters and will
hopefully reduce information asymmetry.
aideadlin.es
aideadlin.es is a webpage to keep track of CV/NLP/ML/AI conference deadlines. It's hosted on GitHub, and countdowns are automatically updated via pull requests to the data file in the repo.
neural-vqa-attention
Torch implementation of an attention-based visual question answering model (Yang et al., CVPR16).
The model looks at an image, reads a question, and comes up with an answer to the question and a heatmap of where it looked in the image to answer it.
Some results here.
neural-vqa
neural-vqa is an efficient, GPU-based Torch implementation of the visual question answering model from the NIPS 2015 paper 'Exploring Models and Data for Image Question Answering' by Ren et al.
Erdős
Erdős by SDSLabs is a competitive math learning platform, similar in spirit to Project Euler, albeit more feature-packed (support for holding competitions, has a social layer) and prettier.
graf
graf plots pretty git contribution bar graphs in the terminal.
gem install graf to install.
HackFlowy
Clone of WorkFlowy.com, a beautiful, list-based note-taking website that has a 500-item monthly limit on the free tier :-(. This project is an open-source clone of WorkFlowy. "Make lists. Not war." :-)
AirMaps
AirMaps was a fun hackathon project that lets users navigate through Google Earth with gestures and speech commands using a Kinect sensor. It was the winning entry in Microsoft Code.Fun.Do.
HackView
Another fun hackathon-winning project built during Yahoo! HackU! 2012 that involves webRTC-based P2P video chat, and was faster than any other video chat provider (at the time, before Google launched Hangouts).
8tracks-downloader
Ugly-looking, but super-effective bash script for downloading entire playlists from 8tracks. (Still works as of 10/2016).












