Object-Centric Dexterous Manipulation from Human Motion Data

CoRL 2024

1Stanford University, 2Peking University


We introduce a hierarchical framework that uses human hand motion data and deep reinforcement learning to train dexterous robot hands for effective object-centric manipulation in both simulation and real world.


Despite being trained only in simulation, our system demonstrates zero-shot transfer to two real-world robots equipped with the dexterous hand.

Abstract

Manipulating objects to achieve desired goal states is a basic but important skill for dexterous manipulation. Human hand motions demonstrate proficient manipulation capability, providing valuable data for training robots with multi-finger hands. Despite this potential, substantial challenges arise due to the embodiment gap between human and robot hands. In this work, we introduce a hierarchical policy learning framework that uses human hand motion data for training object-centric dexterous robot manipulation. At the core of our method is a high-level trajectory generative model, learned with a large-scale human hand motion capture dataset, to synthesize human-like wrist motions conditioned on the desired object goal states. Guided by the generated wrist motions, deep reinforcement learning is further used to train a low-level finger controller that is grounded in the robot's embodiment to physically interact with the object to achieve the goal. Through extensive evaluation across 10 household objects, our approach not only demonstrates superior performance but also showcases generalization capability to novel object geometries and goal states. Furthermore, we transfer the learned policies from simulation to a real-world bimanual dexterous robot system, further demonstrating its applicability in real-world scenarios.

Method



(A) Training: Firstly, we use human motion capture data to train a generation model to synthesize dual hand trajectory conditions on object trajectory. Then we use the RL to train a low-level robot controller conditioned on the dual hand trajectory generated by the trained high-level planner. During this process we augment the data in simulation to improve the high-level planner and low-level controller simultaneously. (B) Inference: Given a single object goal trajectory, our framework generates dual hand reference trajectory and guides the low-level controller to accomplish the task.

Experiments

Environment Setups


Overview of the environment setups:
(a). Workspace of the simulation. We employ two Shadow Hands, each individually mounted on separate UR10e robots, arranged in an abreast configuration.
(b). Object sets in the simulation and the real-world.
(c). Workspace of the real-world, mirroring the simulation, the robot system uses the same Shadow Hands and UR10e robots as the simulation.

Import Human Motion Capture Data to Simulation




The upper left video is the object goal trajectory input, and the upper right video is the high-level planner output (wrist motion generation).

Optimizing Human Mocap Data via Reinforcement Learning




The upper left video is the object goal trajectory input, and the upper right video is the high-level planner output (wrist motion generation).

Learning Object-Centric Dexterous Manipulation in Isaac-Gym




Upper Left Video: Object goal trajectories from Human Mocap Data (ARCTIC Dataset), which are the input of our policy.
Upper Right Video: High-level planner output (wrist motion generation). Object motion replay for visualization.
Lower Video: Low-level policy output (finger + wrist motion). Fully autonomous results, no object motion replay.












Additional Experiments


Experiments on the different embodiment. We apply our method to four different types of multi-fingered dexterous hands, varying in size and degree of freedom. Our method achieved more than 50% completion rate for all hands, demonstrating that our framework can effectively transfer human data to different robot hand embodiments.

Quantitative results in the Building Blocks task



Conclusion

In this work, we present a hierarchical policy learning framework that effectively utilizes human hand motion data to train object-centric dexterous robot manipulation. At the core of our method is a high-level trajectory generative model trained with a large-scale human hand motion capture dataset, which synthesizes human-like wrist motions conditioned on the object goal trajectory. Guided by these wrist motions, we further trained an RL-based low-level finger controller to achieve the task goal. Our approach demonstrated superior performance across various household objects and showcased generalization capabilities to novel object geometries and goal trajectories. Moreover, the successful transfer of the learned policies from simulation to a real-world bimanual dexterous robot system underscores the practical applicability of our method in real-world scenarios.

BibTeX

@inproceedings{chenobject,
      title={Object-Centric Dexterous Manipulation from Human Motion Data},
      author={Chen, Yuanpei and Wang, Chen and Yang, Yaodong and Liu, Karen},
      booktitle={8th Annual Conference on Robot Learning}
    }