| Minds, brains, and programs |
|
1980 |
Philosophy |
| The symbol grounding problem |
|
1990 |
Philosophy |
| Scripts, Plans, Goals, and Understanding |
|
1977 |
Origins |
| Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data |
|
2020 |
Philosophy |
| Experience Grounds Language |
|
2020 |
Philosophy |
| Robots That Use Language |
|
2020 |
Philosophy |
| Spoken language interaction with robots: Recommendations for future research |
|
2022 |
Philosophy |
| SHRDLU |
MIT Report |
1971 |
Origins |
| Walk the talk: connecting language, knowledge, and action in route instructions |
AAAI |
2006 |
Origins |
| Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions |
TACL |
2013 |
Origins |
| RLBench |
|
|
Simulation |
| AI2Thor |
|
|
Simulation |
| ManipulaThor |
|
|
Simulation |
| PyBullet |
|
|
Simulation |
| ISAAC Gym |
|
|
Simulation |
| Omniverse |
|
|
Simulation |
| RoboCasa |
|
|
Simulation |
| TextWorld |
|
|
Simulation |
| VirtualHome |
|
|
Simulation |
| ProcThor |
|
|
Simulation |
| RoboThor |
|
|
Simulation |
| AirSim |
|
|
Simulation |
| iGibson |
|
|
Simulation |
| Habitat |
|
|
Simulation |
| ThreeDWorld |
|
|
Simulation |
| Behavior-1K |
|
|
Simulation |
| Learning to Interpret Natural Language Navigation Instructions from Observations |
AAAI |
2011 |
Discrete |
| Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments |
CVPR |
2018 |
Discrete |
| Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding |
EMNLP |
2020 |
Discrete |
| Learning to Follow Navigational Directions |
ACL |
2010 |
Continuous |
| Vision and Language Navigation in Continuous Environments |
ECCV |
2020 |
Continuous |
| Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction |
CoRL |
2018 |
Continuous |
| CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory |
RSS |
|
Continuous |
| Learning Semantic Maps from Natural Language Descriptions |
RSS |
2013 |
Mapping |
| Self-Monitoring Navigation Agent via Auxiliary Progress Estimation |
ICLR |
2019 |
Replanning |
| Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation |
CVPR |
2019 |
Replanning |
| Emergence of Maps in the Memories of Blind Navigation Agents |
ICLR |
2023 |
Mapping |
| Iterative Vision-and-Language Navigation |
CVPR |
2023 |
Replanning |
| Sim-to-Real Transfer for Vision-and-Language Navigation |
CoRL |
2020 |
Sim2Real |
| Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation |
ICRA |
2021 |
Sim2Real |
| Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight |
CoRL |
2019 |
Continuous |
| Natural Language Communication with Robots |
NAACL |
2016 |
Real values |
| ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks |
CVPR |
2020 |
State Tracking and Task Planning |
| A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution |
CoRL |
2021 |
State Tracking and Task Planning |
| FILM: Following Instructions in Language with Modular Methods |
ICLR |
2022 |
State Tracking and Task Planning |
| Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents |
|
2022 |
LM Planners |
| ProgPrompt: Generating Situated Robot Task Plans using Large Language Models |
|
2023 |
LM Planners |
| Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following |
|
2023 |
LM Planners |
| Code as Policies: Language Model Programs for Embodied Control |
|
|
LM Planners |
| Do As I Can, Not As I Say: Grounding Language in Robotic Affordances |
|
2022 |
LM Planners |
| Visual Language Maps for Robot Navigation |
|
2023 |
Large _____ Models |
| Grounding Language with Visual Affordances over Unstructured Data |
|
2023 |
Large _____ Models |
| CLIPort: What and Where Pathways for Robotic Manipulation |
|
2021 |
Large _____ Models |
| Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation |
|
2022 |
Large _____ Models |
| Transporter Networks: Rearranging the Visual World for Robotic Manipulation |
|
2020 |
Manipulators & Representing Space |
| VIMA: General Robot Manipulation with Multimodal Prompts |
NeurIPS FMDM Workshop |
2022 |
Manipulators & Representing Space |
| Language Conditioned Imitation Learning over Unstructured Data |
|
2021 |
Manipulators & Representing Space |
| VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation |
NeurIPS Datasets and Benchmarks |
2022 |
Manipulators & Representing Space |
| Chasing Ghosts: Instruction Following as Bayesian State Tracking |
NeurIPS |
2019 |
Imagination |
| Prospection: Interpretable Plans From Language By Predicting the Future |
ICRA |
2019 |
Imagination |
| Learning Universal Policies via Text-Guided Video Generation |
|
2023 |
Imagination |
| Diffusion-based Generation, Optimization, and Planning in 3D Scenes |
|
2023 |
Imagination |
| Mastering Diverse Domains through World Models |
|
2023 |
Imagination |
| DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics |
|
2023 |
Imagination |
| Scaling Robot Learning with Semantically Imagined Experience |
|
2023 |
Imagination |
| Reward-rational (implicit) choice: A unifying formalism for reward learning |
|
|
Pragmatics |
| Legibility and Predictability of Robot Motion |
|
|
Pragmatics |
| Learning Language Games through Interaction |
ACL |
2016 |
Concept Learning |
| Naturalizing a Programming Language via Interactive Learning |
ACL |
2017 |
Concept Learning |
| Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following |
CoRL |
2020 |
Concept Learning |
| Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions |
AAAI |
2014 |
Concept Learning |
| Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” |
IJCAI |
2016 |
|
| Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models |
|
|
Concept Learning |
| Correcting Robot Plans with Natural Language Feedback |
|
|
Feedback |
| DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents |
|
2022 |
Feedback |
| \"No, to the Right\" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy |
|
2023 |
Feedback |
| Interactive Language: Talking to Robots in Real Time |
|
2022 |
Feedback |
| Improving Grounded Natural Language Understanding through Human-Robot Dialog |
ICRA |
2019 |
Feedback |
| Asking for Help Using Inverse Semantics |
RSS |
2014 |
Feedback |
| Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning |
EMNLP |
2019 |
Feedback |
| Miscommunication Detection and Recovery in Situated Human–Robot Dialogue |
ACM Transactions on Interactive Intelligent Systems |
2019 |
Feedback |
| Speaker-Follower Models for Vision-and-Language Navigation |
NeurIPS |
2018 |
Feedback |
| Cooperative Vision-and-Dialog Navigation |
CoRL |
2019 |
Feedback |
| Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue |
SigDial |
2014 |
Feedback |
| TEACh: Task-driven Embodied Agents that Chat |
AAAI |
2022 |
Feedback |
| Collaborative Dialogue in Minecraft |
ACL |
2019 |
Feedback |
| MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks |
EMNLP |
2021 |
Feedback |
| DANLI: Deliberative Agent for Following Natural Language Instructions |
EMNLP |
2022 |
Feedback |
| Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue |
EMNLP |
2022 |
Feedback |
| LLM Powered Autonomous Agents (blog) |
|
|
Planning |
| LaMPP: Language Models as Probabilistic Priors for Perception and Action |
|
|
Planning |
| LLM+P: Empowering Large Language Models with Optimal Planning Proficiency |
|
|
Planning |
| Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling |
|
|
Planning |
| Grounding Classical Task Planners via Vision-Language Models |
|
|
Planning |
| Language Models as Zero-Shot Trajectory Generators |
|
|
Planning |
| ViNT: A Foundation Model for Visual Navigation |
|
|
Navigation |
| SACSoN: Scalable Autonomous Data Collection for Social Navigation |
|
|
Navigation |
| A System for Generalized 3D Multi-Object Search |
|
|
Navigation |
| CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation |
|
|
Navigation |
| Principles and Guidelines for Evaluating Social Robot Navigation Algorithms |
|
|
Navigation |
| RVT: Robotic View Transformer for 3D Object Manipulation |
|
|
Manipulation |
| Physically Grounded Vision-Language Models for Robotic Manipulation |
|
|
Manipulation |
| LATTE: LAnguage Trajectory TransformEr |
|
|
Manipulation |
| LIV: Language-Image Representations and Rewards for Robotic Control |
|
|
Manipulation |
| Gesture-Informed Robot Assistance via Foundation Models |
|
|
Manipulation |
| SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling |
|
|
Manipulation |
| Language-Driven Representation Learning for Robotics |
|
|
Manipulation |
| EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought |
|
|
Manipulation |
| VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models |
|
|
Manipulation |
| VIMA: General Robot Manipulation with Multimodal Prompts |
|
|
Manipulation |
| Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks |
|
|
Manipulation |
| Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control |
|
|
Mobile Manipulation |
| Open-World Object Manipulation using Pre-Trained Vision-Language Models |
|
|
Mobile Manipulation |
| Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models |
|
|
Mobile Manipulation |
| HomeRobot: Open-Vocabulary Mobile Manipulation |
|
|
Mobile Manipulation |
| LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place |
|
|
Mobile Manipulation |
| Spatial-Language Attention Policies |
|
|
Mobile Manipulation |
| TidyBot: Personalized Robot Assistance with Large Language Models |
|
|
Mobile Manipulation |
| SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning |
|
|
Mobile Manipulation |
| Language to Reward for Robot Skill Synthesis |
|
|
Language to Motion |
| Text2Motion: From Natural Language Instructions to Feasible Plans |
|
|
Language to Motion |
| SayTap: Language to Quadrupedal Locomotion |
|
|
Language to Motion |
| Natural Language Can Help Bridge the Sim2Real Gap |
|
|
Sim2Real |
| ChatGPT for Robotics: Design Principles and Model Abilities |
|
|
|
| SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog |
|
|
Dialogue, QA, |
| SQA3D: Situated Question Answering in 3D Scenes |
|
|
Dialogue, QA, |
| Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners |
|
|
Dialogue, QA, |
| Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning |
|
|
Dialogue, QA, |
| DROC: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections |
|
|
Dialogue, QA, |
| Large Language Models as General Pattern Machines |
|
|
|
| Language to Rewards for Robotic Skill Synthesis |
|
|
|
| Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? |
|
|
|
| MotionGPT |
|
|
|
| Modeling Dynamic Environments with Scene Graph Memory |
|
|
|
| Affordances from Human Videos as a Versatile Representation for Robotics |
|
|
|
| RoboCat: A self-improving robotic agent |
|
|
|
| ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification |
|
|
|
| Language Is Not All You Need: Aligning Perception with Language Models |
|
|
|
| Kosmos-2: Grounding Multimodal Large Language Models to the World |
|
|
|
| Behavior Transformers: Cloning k modes with one stone |
|
|
|
| From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data |
|
|
|
| Affordance Diffusion: Synthesizing Hand-Object Interactions |
|
|
|
| Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment |
|
|
|
| R3M: A Universal Visual Representation for Robot Manipulation |
|
|
|
| Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics |
|
|
|
| LangSplat: 3D Language Gaussian Splatting |
|
|
|