Just a big 'ol list. Suggestions welcome

Papers

Title Venue Year Note
Minds, brains, and programs 1980 Philosophy
The symbol grounding problem 1990 Philosophy
Scripts, Plans, Goals, and Understanding 1977 Origins
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data 2020 Philosophy
Experience Grounds Language 2020 Philosophy
Robots That Use Language 2020 Philosophy
Spoken language interaction with robots: Recommendations for future research 2022 Philosophy
SHRDLU MIT Report 1971 Origins
Walk the talk: connecting language, knowledge, and action in route instructions AAAI 2006 Origins
Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions TACL 2013 Origins
RLBench Simulation
AI2Thor Simulation
ManipulaThor Simulation
PyBullet Simulation
ISAAC Gym Simulation
Omniverse Simulation
RoboCasa Simulation
TextWorld Simulation
VirtualHome Simulation
ProcThor Simulation
RoboThor Simulation
AirSim Simulation
iGibson Simulation
Habitat Simulation
ThreeDWorld Simulation
Behavior-1K Simulation
Learning to Interpret Natural Language Navigation Instructions from Observations AAAI 2011 Discrete
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments CVPR 2018 Discrete
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding EMNLP 2020 Discrete
Learning to Follow Navigational Directions ACL 2010 Continuous
Vision and Language Navigation in Continuous Environments ECCV 2020 Continuous
Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction CoRL 2018 Continuous
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory RSS Continuous
Learning Semantic Maps from Natural Language Descriptions RSS 2013 Mapping
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation ICLR 2019 Replanning
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation CVPR 2019 Replanning
Emergence of Maps in the Memories of Blind Navigation Agents ICLR 2023 Mapping
Iterative Vision-and-Language Navigation CVPR 2023 Replanning
Sim-to-Real Transfer for Vision-and-Language Navigation CoRL 2020 Sim2Real
Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation ICRA 2021 Sim2Real
Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight CoRL 2019 Continuous
Natural Language Communication with Robots NAACL 2016 Real values
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks CVPR 2020 State Tracking and Task Planning
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution CoRL 2021 State Tracking and Task Planning
FILM: Following Instructions in Language with Modular Methods ICLR 2022 State Tracking and Task Planning
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents 2022 LM Planners
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models 2023 LM Planners
Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following 2023 LM Planners
Code as Policies: Language Model Programs for Embodied Control LM Planners
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 LM Planners
Visual Language Maps for Robot Navigation 2023 Large _____ Models
Grounding Language with Visual Affordances over Unstructured Data 2023 Large _____ Models
CLIPort: What and Where Pathways for Robotic Manipulation 2021 Large _____ Models
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation 2022 Large _____ Models
Transporter Networks: Rearranging the Visual World for Robotic Manipulation 2020 Manipulators & Representing Space
VIMA: General Robot Manipulation with Multimodal Prompts NeurIPS FMDM Workshop 2022 Manipulators & Representing Space
Language Conditioned Imitation Learning over Unstructured Data 2021 Manipulators & Representing Space
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation NeurIPS Datasets and Benchmarks 2022 Manipulators & Representing Space
Chasing Ghosts: Instruction Following as Bayesian State Tracking NeurIPS 2019 Imagination
Prospection: Interpretable Plans From Language By Predicting the Future ICRA 2019 Imagination
Learning Universal Policies via Text-Guided Video Generation 2023 Imagination
Diffusion-based Generation, Optimization, and Planning in 3D Scenes 2023 Imagination
Mastering Diverse Domains through World Models 2023 Imagination
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics 2023 Imagination
Scaling Robot Learning with Semantically Imagined Experience 2023 Imagination
Reward-rational (implicit) choice: A unifying formalism for reward learning Pragmatics
Legibility and Predictability of Robot Motion Pragmatics
Learning Language Games through Interaction ACL 2016 Concept Learning
Naturalizing a Programming Language via Interactive Learning ACL 2017 Concept Learning
Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following CoRL 2020 Concept Learning
Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions AAAI 2014 Concept Learning
Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” IJCAI 2016
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models Concept Learning
Correcting Robot Plans with Natural Language Feedback Feedback
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents 2022 Feedback
\"No, to the Right\" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy 2023 Feedback
Interactive Language: Talking to Robots in Real Time 2022 Feedback
Improving Grounded Natural Language Understanding through Human-Robot Dialog ICRA 2019 Feedback
Asking for Help Using Inverse Semantics RSS 2014 Feedback
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning EMNLP 2019 Feedback
Miscommunication Detection and Recovery in Situated Human–Robot Dialogue ACM Transactions on Interactive Intelligent Systems 2019 Feedback
Speaker-Follower Models for Vision-and-Language Navigation NeurIPS 2018 Feedback
Cooperative Vision-and-Dialog Navigation CoRL 2019 Feedback
Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue SigDial 2014 Feedback
TEACh: Task-driven Embodied Agents that Chat AAAI 2022 Feedback
Collaborative Dialogue in Minecraft ACL 2019 Feedback
MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks EMNLP 2021 Feedback
DANLI: Deliberative Agent for Following Natural Language Instructions EMNLP 2022 Feedback
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue EMNLP 2022 Feedback
LLM Powered Autonomous Agents (blog) Planning
LaMPP: Language Models as Probabilistic Priors for Perception and Action Planning
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency Planning
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling Planning
Grounding Classical Task Planners via Vision-Language Models Planning
Language Models as Zero-Shot Trajectory Generators Planning
ViNT: A Foundation Model for Visual Navigation Navigation
SACSoN: Scalable Autonomous Data Collection for Social Navigation Navigation
A System for Generalized 3D Multi-Object Search Navigation
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation Navigation
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms Navigation
RVT: Robotic View Transformer for 3D Object Manipulation Manipulation
Physically Grounded Vision-Language Models for Robotic Manipulation Manipulation
LATTE: LAnguage Trajectory TransformEr Manipulation
LIV: Language-Image Representations and Rewards for Robotic Control Manipulation
Gesture-Informed Robot Assistance via Foundation Models Manipulation
SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling Manipulation
Language-Driven Representation Learning for Robotics Manipulation
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought Manipulation
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models Manipulation
VIMA: General Robot Manipulation with Multimodal Prompts Manipulation
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks Manipulation
Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control Mobile Manipulation
Open-World Object Manipulation using Pre-Trained Vision-Language Models Mobile Manipulation
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models Mobile Manipulation
HomeRobot: Open-Vocabulary Mobile Manipulation Mobile Manipulation
LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place Mobile Manipulation
Spatial-Language Attention Policies Mobile Manipulation
TidyBot: Personalized Robot Assistance with Large Language Models Mobile Manipulation
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning Mobile Manipulation
Language to Reward for Robot Skill Synthesis Language to Motion
Text2Motion: From Natural Language Instructions to Feasible Plans Language to Motion
SayTap: Language to Quadrupedal Locomotion Language to Motion
Natural Language Can Help Bridge the Sim2Real Gap Sim2Real
ChatGPT for Robotics: Design Principles and Model Abilities
SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog Dialogue, QA,
SQA3D: Situated Question Answering in 3D Scenes Dialogue, QA,
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners Dialogue, QA,
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning Dialogue, QA,
DROC: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections Dialogue, QA,
Large Language Models as General Pattern Machines
Language to Rewards for Robotic Skill Synthesis
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
MotionGPT
Modeling Dynamic Environments with Scene Graph Memory
Affordances from Human Videos as a Versatile Representation for Robotics
RoboCat: A self-improving robotic agent
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
Language Is Not All You Need: Aligning Perception with Language Models
Kosmos-2: Grounding Multimodal Large Language Models to the World
Behavior Transformers: Cloning k modes with one stone
From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data
Affordance Diffusion: Synthesizing Hand-Object Interactions
Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
R3M: A Universal Visual Representation for Robot Manipulation
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
LangSplat: 3D Language Gaussian Splatting