Minds, brains, and programs |
|
1980 |
Philosophy |
The symbol grounding problem |
|
1990 |
Philosophy |
Scripts, Plans, Goals, and Understanding |
|
1977 |
Origins |
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data |
|
2020 |
Philosophy |
Experience Grounds Language |
|
2020 |
Philosophy |
Robots That Use Language |
|
2020 |
Philosophy |
Spoken language interaction with robots: Recommendations for future research |
|
2022 |
Philosophy |
SHRDLU |
MIT Report |
1971 |
Origins |
Walk the talk: connecting language, knowledge, and action in route instructions |
AAAI |
2006 |
Origins |
Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions |
TACL |
2013 |
Origins |
RLBench |
|
|
Simulation |
AI2Thor |
|
|
Simulation |
ManipulaThor |
|
|
Simulation |
PyBullet |
|
|
Simulation |
ISAAC Gym |
|
|
Simulation |
Omniverse |
|
|
Simulation |
RoboCasa |
|
|
Simulation |
TextWorld |
|
|
Simulation |
VirtualHome |
|
|
Simulation |
ProcThor |
|
|
Simulation |
RoboThor |
|
|
Simulation |
AirSim |
|
|
Simulation |
iGibson |
|
|
Simulation |
Habitat |
|
|
Simulation |
ThreeDWorld |
|
|
Simulation |
Behavior-1K |
|
|
Simulation |
Learning to Interpret Natural Language Navigation Instructions from Observations |
AAAI |
2011 |
Discrete |
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments |
CVPR |
2018 |
Discrete |
Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding |
EMNLP |
2020 |
Discrete |
Learning to Follow Navigational Directions |
ACL |
2010 |
Continuous |
Vision and Language Navigation in Continuous Environments |
ECCV |
2020 |
Continuous |
Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction |
CoRL |
2018 |
Continuous |
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory |
RSS |
|
Continuous |
Learning Semantic Maps from Natural Language Descriptions |
RSS |
2013 |
Mapping |
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation |
ICLR |
2019 |
Replanning |
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation |
CVPR |
2019 |
Replanning |
Emergence of Maps in the Memories of Blind Navigation Agents |
ICLR |
2023 |
Mapping |
Iterative Vision-and-Language Navigation |
CVPR |
2023 |
Replanning |
Sim-to-Real Transfer for Vision-and-Language Navigation |
CoRL |
2020 |
Sim2Real |
Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation |
ICRA |
2021 |
Sim2Real |
Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight |
CoRL |
2019 |
Continuous |
Natural Language Communication with Robots |
NAACL |
2016 |
Real values |
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks |
CVPR |
2020 |
State Tracking and Task Planning |
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution |
CoRL |
2021 |
State Tracking and Task Planning |
FILM: Following Instructions in Language with Modular Methods |
ICLR |
2022 |
State Tracking and Task Planning |
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents |
|
2022 |
LM Planners |
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models |
|
2023 |
LM Planners |
Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following |
|
2023 |
LM Planners |
Code as Policies: Language Model Programs for Embodied Control |
|
|
LM Planners |
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances |
|
2022 |
LM Planners |
Visual Language Maps for Robot Navigation |
|
2023 |
Large _____ Models |
Grounding Language with Visual Affordances over Unstructured Data |
|
2023 |
Large _____ Models |
CLIPort: What and Where Pathways for Robotic Manipulation |
|
2021 |
Large _____ Models |
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation |
|
2022 |
Large _____ Models |
Transporter Networks: Rearranging the Visual World for Robotic Manipulation |
|
2020 |
Manipulators & Representing Space |
VIMA: General Robot Manipulation with Multimodal Prompts |
NeurIPS FMDM Workshop |
2022 |
Manipulators & Representing Space |
Language Conditioned Imitation Learning over Unstructured Data |
|
2021 |
Manipulators & Representing Space |
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation |
NeurIPS Datasets and Benchmarks |
2022 |
Manipulators & Representing Space |
Chasing Ghosts: Instruction Following as Bayesian State Tracking |
NeurIPS |
2019 |
Imagination |
Prospection: Interpretable Plans From Language By Predicting the Future |
ICRA |
2019 |
Imagination |
Learning Universal Policies via Text-Guided Video Generation |
|
2023 |
Imagination |
Diffusion-based Generation, Optimization, and Planning in 3D Scenes |
|
2023 |
Imagination |
Mastering Diverse Domains through World Models |
|
2023 |
Imagination |
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics |
|
2023 |
Imagination |
Scaling Robot Learning with Semantically Imagined Experience |
|
2023 |
Imagination |
Reward-rational (implicit) choice: A unifying formalism for reward learning |
|
|
Pragmatics |
Legibility and Predictability of Robot Motion |
|
|
Pragmatics |
Learning Language Games through Interaction |
ACL |
2016 |
Concept Learning |
Naturalizing a Programming Language via Interactive Learning |
ACL |
2017 |
Concept Learning |
Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following |
CoRL |
2020 |
Concept Learning |
Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions |
AAAI |
2014 |
Concept Learning |
Learning Multi-Modal Grounded Linguistic Semantics by Playing “I Spy” |
IJCAI |
2016 |
|
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models |
|
|
Concept Learning |
Correcting Robot Plans with Natural Language Feedback |
|
|
Feedback |
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents |
|
2022 |
Feedback |
\"No, to the Right\" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy |
|
2023 |
Feedback |
Interactive Language: Talking to Robots in Real Time |
|
2022 |
Feedback |
Improving Grounded Natural Language Understanding through Human-Robot Dialog |
ICRA |
2019 |
Feedback |
Asking for Help Using Inverse Semantics |
RSS |
2014 |
Feedback |
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning |
EMNLP |
2019 |
Feedback |
Miscommunication Detection and Recovery in Situated Human–Robot Dialogue |
ACM Transactions on Interactive Intelligent Systems |
2019 |
Feedback |
Speaker-Follower Models for Vision-and-Language Navigation |
NeurIPS |
2018 |
Feedback |
Cooperative Vision-and-Dialog Navigation |
CoRL |
2019 |
Feedback |
Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue |
SigDial |
2014 |
Feedback |
TEACh: Task-driven Embodied Agents that Chat |
AAAI |
2022 |
Feedback |
Collaborative Dialogue in Minecraft |
ACL |
2019 |
Feedback |
MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks |
EMNLP |
2021 |
Feedback |
DANLI: Deliberative Agent for Following Natural Language Instructions |
EMNLP |
2022 |
Feedback |
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue |
EMNLP |
2022 |
Feedback |
LLM Powered Autonomous Agents (blog) |
|
|
Planning |
LaMPP: Language Models as Probabilistic Priors for Perception and Action |
|
|
Planning |
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency |
|
|
Planning |
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling |
|
|
Planning |
Grounding Classical Task Planners via Vision-Language Models |
|
|
Planning |
Language Models as Zero-Shot Trajectory Generators |
|
|
Planning |
ViNT: A Foundation Model for Visual Navigation |
|
|
Navigation |
SACSoN: Scalable Autonomous Data Collection for Social Navigation |
|
|
Navigation |
A System for Generalized 3D Multi-Object Search |
|
|
Navigation |
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation |
|
|
Navigation |
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms |
|
|
Navigation |
RVT: Robotic View Transformer for 3D Object Manipulation |
|
|
Manipulation |
Physically Grounded Vision-Language Models for Robotic Manipulation |
|
|
Manipulation |
LATTE: LAnguage Trajectory TransformEr |
|
|
Manipulation |
LIV: Language-Image Representations and Rewards for Robotic Control |
|
|
Manipulation |
Gesture-Informed Robot Assistance via Foundation Models |
|
|
Manipulation |
SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling |
|
|
Manipulation |
Language-Driven Representation Learning for Robotics |
|
|
Manipulation |
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought |
|
|
Manipulation |
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models |
|
|
Manipulation |
VIMA: General Robot Manipulation with Multimodal Prompts |
|
|
Manipulation |
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks |
|
|
Manipulation |
Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control |
|
|
Mobile Manipulation |
Open-World Object Manipulation using Pre-Trained Vision-Language Models |
|
|
Mobile Manipulation |
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models |
|
|
Mobile Manipulation |
HomeRobot: Open-Vocabulary Mobile Manipulation |
|
|
Mobile Manipulation |
LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place |
|
|
Mobile Manipulation |
Spatial-Language Attention Policies |
|
|
Mobile Manipulation |
TidyBot: Personalized Robot Assistance with Large Language Models |
|
|
Mobile Manipulation |
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning |
|
|
Mobile Manipulation |
Language to Reward for Robot Skill Synthesis |
|
|
Language to Motion |
Text2Motion: From Natural Language Instructions to Feasible Plans |
|
|
Language to Motion |
SayTap: Language to Quadrupedal Locomotion |
|
|
Language to Motion |
Natural Language Can Help Bridge the Sim2Real Gap |
|
|
Sim2Real |
ChatGPT for Robotics: Design Principles and Model Abilities |
|
|
|
SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog |
|
|
Dialogue, QA, |
SQA3D: Situated Question Answering in 3D Scenes |
|
|
Dialogue, QA, |
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners |
|
|
Dialogue, QA, |
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning |
|
|
Dialogue, QA, |
DROC: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections |
|
|
Dialogue, QA, |
Large Language Models as General Pattern Machines |
|
|
|
Language to Rewards for Robotic Skill Synthesis |
|
|
|
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? |
|
|
|
MotionGPT |
|
|
|
Modeling Dynamic Environments with Scene Graph Memory |
|
|
|
Affordances from Human Videos as a Versatile Representation for Robotics |
|
|
|
RoboCat: A self-improving robotic agent |
|
|
|
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification |
|
|
|
Language Is Not All You Need: Aligning Perception with Language Models |
|
|
|
Kosmos-2: Grounding Multimodal Large Language Models to the World |
|
|
|
Behavior Transformers: Cloning k modes with one stone |
|
|
|
From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data |
|
|
|
Affordance Diffusion: Synthesizing Hand-Object Interactions |
|
|
|
Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment |
|
|
|
R3M: A Universal Visual Representation for Robot Manipulation |
|
|
|
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics |
|
|
|
LangSplat: 3D Language Gaussian Splatting |
|
|
|