Where is my robot butler?

  • Why is everyone using language models?
  • Why use learning based methods when control works?
  • What's the connection between "foundation" models and robotics?
  • Will GPT-5 be a robot? 😱
Household robots need to move beyond simple programmed tasks like those a Roomba performs and become full-fledged digital assistants.
A robotic agent that exists (physically) in the world, gains access to rich and personalized knowledge of its environment. For example, they might be able to answer questions like: How much do things weigh? What's fragile? Or where you store the extra chocolates that you don't want anyone to find? Building an agent that can accomplish tasks requires the integration of a diverse set of technologies and engineering. Language models, SLAM, semantic mapping, task planning, understanding object affordances, and end effector control. This course will cover both foundational works in grounding language to action and analyze (or reimplement) state-of-the-art Large Language Model based task planners. This area is fascinating and difficult because it is so cross-cutting. Readings and topics are pulled from Robotics (CoRL, ICRA, RSS, HRI, IROS), Computer Vision (CVPR, I/ECCV), Natural Language Processing (ACL, EMNLP), and Machine Learning (ICLR, ICML, NeurIPS).

Projects will be scoped by prior hardware/simulator experience -- but knowledge of Deep Learning + one specialty (NLP/CV/Robotics) is basically required. Send Qs to Yonatan (ybisk@cs).

Topics
  • LLMs & Foundation Models
  • Instruction following & Dialogue
  • Task and Motion Planning
  • End-Effector & real-valued control
  • Semantic Mapping (2D and 3D)
  • World Models
Questions
  • How do you define or evaluate Dialogue?
  • Limitations of offline and unimodal pretraining
  • How does embodiment shape meaning?
  • Discrete vs continuous spaces and representations.
  • When is Sim2Real possible? What's about manipulation?
  • I only have one brain, do I need more than one model?
Logistics
A basic course schedule is presented below, additional readings here: Additional Readings
Tues
Thurs

What is embodiment? Designing your world

Aug 29: Philosophy
Aug 31: Origins of the Field
Sept 5: Simulators and Action Spaces
Sept 7: Project Discussion
Sept 12: Robo Basics (Lecture, no reading)
  • SLAM
  • Semantic Mapping
  • Inverse Kinematics
Sept 14: NLP and LLMs (Lecture, no reading)
  • Syntactic/Semantic Parsing
  • (Large) Language Models
  • Multimodal Transformers

Navigation

Sept 19: Discrete Worlds
Sept 21: Continuous Worlds
Sept 26: Mapping and Replanning
Sept 28: Real Valued output and Sim2Real
Oct 3: State Tracking and Task Planning
Oct 5: Language Models as Planners

Manipulation

Oct 10: Large Language Models (cont)
Oct 12: Manipulators & Representing Space
Fall Break
Oct 24: Scaling Manipulation
Oct 26: Imagination

Feedback, Dialogue, and Teaching

Oct 31: Concept Learning
Nov 2: Feedback
Nov 7: Election Day Nov 9: CoRL
Nov 14: Daniel Fried
Nov 16: Asking for Help and Dialogue
Nov 21: Generating Language
Nov 23: No class: Thanksgiving
Nov 28: Theory-of-Mind and Open Challenges
Nov 30: Paper Presentations
We gotta wait and see what people publish! Here are a few things that haven't made it into the schedule yet
Dec 5: Paper Presentations Dec 7: Final Presentations
This course is available as both a seminar (6 credits) and project based (12 credits) course.
6 Credit Seminar 12 Credit Project
Paper Summaries 5pts * 8 papers 5pts * 8 papers
- Student Paper 5pts * 1 papers 5pts * 1 papers
- Paper Presentation 10 pts 10 pts
Project
- Proposal 15 pts (theoretical) 15 pts (practical)
- Final Report 30 pts 30 pts (include implementation details or demo)
Proposal: Both seminar and project based students will write a proposal. While project students will go on to work on implementation, the seminar students should also go through the mental exercise of planning out what a system needs, what dependencies components have, where gradients might flow, etc. They will then get to revise their understanding in their final report.

Groups: Both seminar and project based assignments will be done in groups. Groups will likely be capped at five people.

Equal Participation: All reports must include a breakdown of each teammate's contributions.

Paper Summary (5/3pts)
  • What is the key insight of this paper and problem they are addressing?
  • What assumptions or simplifications do they make about the world or information flow?
  • What changes might enable better generalization to more realistic settings?
Project Proposal (15pts)
  • Task, Environment, and Skills Definitions
  • Minimal language covered and stretch goals
  • Failure recovery and replanning/feedback strategy
Midsemester Presentation (15pts)
  • Interactive demo of basic skills
  • Example of successful composition
  • Demonstration and analysis of failures
  • Proposal of changes for final demo (including rescoping)
Final Presentation (10pts)
  • Interactive demo of compositional instructions
  • Example of successful corrections/feedback
  • Demonstration and analysis of failures
Final Report (30pts)
  • 12 Credit: Technical write-up and specification of system (including code)
  • 12 Credit: Technical write-up of model design (including code)
  • All: Literature Review of state of the field
  • All: Discussion of key limitations to progress in this space
The course will be primarily centered on a few robot platforms or simulators based on the size of the course enrollment and the prior experience of students taking the class for 12 credits.

There are a couple other simulators/platforms I also like, which we can discuss as options.

Platform Type Notes
VLN-CE Simulated Navigation
Minimal hardware experience
Proj: Language to Angle/Distance
Teams:No limit on teams
DexArm Simple gripper
Basic manipulation platform
Proj: Language to 3D Space
Teams: Two teams of ~4
Hello Robot Stretch Mobile Manipulator
Requires skill specifications
Proj: Language to ... let's decide
Teams (Probably) one team -- Let's see
Control Code: Meta Home-Robot
Collaborator: Chris Paxton
Late Assignments
  • All teams have 5 late days, these are only applicable to reports (not demos).
  • Paper summaries lose 1pt per day late

COVID Details:

In the event a student tests positive for COVID-19, they will be invited to attend discussion virtually and will be expected to participate as usual. This includes participation points for raising their hands with questions/answers and submission of lab-notebooks. Note, that students who attend class while exhibiting symptoms will be told to leave and join virtually for the protection of all others present.

Accommodations for Students with Disabilities:

If you have a disability and have an accommodations letter from the Disability Resources office, we encourage you to discuss your accommodations and needs with the instructors as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

  1. Can we use other platforms? Yes! What robots do you have? Also checkout AI Maker Space
  2. What about custom sensors and hardware? Same answer :)
  3. What about other simulators? Same answer :)
  4. LTI Curriculum Categories? 12 Hour version can be counted for a Task and a Lab
  5. Do I /need/ simulator experience? No, but plan to spend some time getting the engineering setup
  6. Can I attend discussion without registering? It's best to register (6hrs) even if you've finished your classes, since I need to prioritize time, energy, and space on registered students. I'll try and update this once I have a room confirmed with the registrar and see how much space we have in the class.