With: Yoonyoung (Jamie) Cho

Why Aren't Robots in Our Homes Today?

When we open up our phones these days and scroll through, we often see cool robot demo videos pop up in our feeds. You can’t help but wonder: these robots are so cool, but why are they not in our homes yet?

The main problem is the vast diversity of situations that we face every day in the real world: a robot may be trained to pick-and-place an object really well, but what happens now if the object is too heavy to lift, or too large to grasp with one hand, or occluded behind walls? While a human would push, topple, or roll these objects, robots that lack these skills might get stuck, unsure of what to do.

Our mission is to create robots that encompass a wide diversity of skills, so as to skillfully adapt them to accomplish the given task, even for something seemingly simple like transporting objects in warehouses.

https://drive.google.com/file/d/1BX3M1T6GtHDepUgUhzSGH9kxTmLXdD7Z/view?usp=sharing

https://drive.google.com/file/d/1NG6ZPAaK5oZ_m9OlnroE4LsA7c46YM5C/view?usp=sharing

Figure: Two different scenarios for manipulating objects to a target pose in shelves within a warehouse-like environment.

The Core Challenge

This problem is hard because there’s no ‘one-size-fits-all’ solution for every task. Even in the above videos, you sometimes need to use both hands to grasp the object (left) or first use one hand, then transfer the object to the other (right), depending on the task and the object.

While Reinforcement Learning (RL) has emerged as a powerful tool for acquiring new skills, the scope of RL towards general-purpose robots has been relatively limited: while they’re typically very good at learning one particular task, they often struggle to learn when deployed across a wide diversity of objects and surroundings, as they respectively present a different set of constraints.

When you try to learn many different things at once, it’s like you’re rowing a boat with many folks aboard that want to reach different destinations. With so many people rowing in different directions, you may never get to any of the targeted destinations at all! Likewise, when a single, monolithic neural network attempts to learn across multiple skills simultaneously, it struggles: the learning signals for different tasks conflict, causing the agent to learn slowly or fail to learn altogether.

Our Approach: Learning to Specialize and Compose

How do we solve this? We can actually take inspiration from neuroscience: the human brain isn't one singular, giant network where all neurons are firing all the time; it’s a highly modularized system. In motor control, for instance, biological neural networks orchestrate motion with motor modules, a set of neurons that collectively activate to realize a specific motion, which are then combined as needed to produce the target behavior.

We apply this intuition to robot learning: our research investigates modular neural network architectures that adaptively reconfigure themselves by combining different modules based on the task at hand. This allows the system to specialize — where each module can become very good at a specific subproblem rather than trying to learn the hard (and often infeasible) general solution that works in all possible cases, then compose these modules to reuse them across different contexts as necessary.

In a prior work (HAMNet@RSS2025), we adopted this insight and proposed HAMNet (Hierarchical And Modular Network) that learned a highly diverse and generalizable suite of non-prehensile manipulation skills for pushing, toppling, and rolling objects of diverse shapes across a wide range of environments like tables, cabinets, sinks, and baskets.

What’s Ahead: Scaling Modular Skills

Bimanual Dexterous Manipulation

Bimanual Dexterous Manipulation

Locomotion

Locomotion

We are now extending this research to answer the next big questions: