ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface

1Tsinghua University, 2University of California, Berkeley
* Equal contribution, Core contribution

Abstract

Tactile information plays a crucial role for humans and robots to interact effectively with their environment, particularly for tasks requiring the understanding of contact properties. Solving such dexterous manipulation tasks often relies on imitation learning from demonstration datasets, which are typically collected via teleoperation systems and often demand substantial time and effort. To address these challenges, we present ViTaMIn, an embodiment-free manipulation interface that seamlessly integrates visual and tactile sensing into a hand-held gripper, enabling data collection without the need for teleoperation. Our design employs a compliant Fin Ray gripper with tactile sensing, allowing operators to perceive force feedback during manipulation for more intuitive operation. Additionally, we propose a multimodal representation learning strategy to obtain pre-trained tactile representations, improving data efficiency and policy robustness. Experiments on five contact-rich manipulation tasks demonstrate that ViTaMIn significantly outperforms baseline methods, demonstrating its effectiveness for complex manipulation tasks.

Task Demonstrations

Orange placement

Test Tube Reorientation

Scissor Hanging

Sponge Insertion

Articulated Object Manipulation

Dynamic Peg Insertion

Knife Pulling

Generalization

Hardware Details

For those interested in building their own ViTaMIn interface, we provide the following resources:

Component List

Complete list of hardware components and their specifications

View List

Assembly Guide

Step-by-step video tutorial for assembling the interface

Coming Soon

Sensor Tutorial

Detailed guide for fabricating the tactile sensors

Coming Soon

BibTeX