CLAW @ CMU
By popular request, I’ve decided to occasionally dump some links here.
Planning
LLM Powered Autonomous Agents (blog)
LaMPP: Language Models as Probabilistic Priors for Perception and Action
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling
Grounding Classical Task Planners via Vision-Language Models
Language Models as Zero-Shot Trajectory Generators
Navigation
ViNT: A Foundation Model for Visual Navigation
SACSoN: Scalable Autonomous Data Collection for Social Navigation
A System for Generalized 3D Multi-Object Search
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms
Manipulation
RVT: Robotic View Transformer for 3D Object Manipulation
Physically Grounded Vision-Language Models for Robotic Manipulation
LATTE: LAnguage Trajectory TransformEr
LIV: Language-Image Representations and Rewards for Robotic Control
Gesture-Informed Robot Assistance via Foundation Models
SPRINT: Semantic Policy Pre-training via Language Instruction Relabeling
Language-Driven Representation Learning for Robotics
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
VIMA: General Robot Manipulation with Multimodal Prompts
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
Mobile Manipulation
Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control
Open-World Object Manipulation using Pre-Trained Vision-Language Models
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
HomeRobot: Open-Vocabulary Mobile Manipulation
LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place
Spatial-Language Attention Policies
TidyBot: Personalized Robot Assistance with Large Language Models
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning
Language to Motion
Language to Reward for Robot Skill Synthesis
Text2Motion: From Natural Language Instructions to Feasible Plans
SayTap: Language to Quadrupedal Locomotion
Sim-to-Real
Natural Language Can Help Bridge the Sim2Real Gap
Multi-Platform
ChatGPT for Robotics: Design Principles and Model Abilities
Dialogue, QA, Corrections, Pragmatics, …
SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog
SQA3D: Situated Question Answering in 3D Scenes
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning
DROC: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
Other
Large Language Models as General Pattern Machines
Language to Rewards for Robotic Skill Synthesis
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
MotionGPT
Modeling Dynamic Environments with Scene Graph Memory
Affordances from Human Videos as a Versatile Representation for Robotics
RoboCat: A self-improving robotic agent
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
Language Is Not All You Need: Aligning Perception with Language Models
Kosmos-2: Grounding Multimodal Large Language Models to the World
Behavior Transformers: Cloning k modes with one stone
From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data
Affordance Diffusion: Synthesizing Hand-Object Interactions
Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
R3M: A Universal Visual Representation for Robot Manipulation
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
LangSplat: 3D Language Gaussian Splatting