- DuoTact Visuotactile Sensor
- Generalizable Tactile Representation
- Robust Bimanual Pose Tracking
If you have any questions, please feel free to contact us at chuanyu.ne79@gmail.com
Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems.
ViTaMIn-B is a system developed for bimanual visuo-tactile data collection. The system integrates a GoPro Hero 10 camera for vision observation, Meta Quest 3 controllers for 6-DoF bimanual pose acquisition, and two DuoTact sensors for tactile sensing. Gripper width with a maximum span of 8 cm is computed by detecting ArUco markers on the gripper.
As bimanual manipulation demonstration collection occupies both hands, a foot pedal is used to trigger the start and end of recording, enabling efficient single-operator data collection.
Several improvements were introduced:
1. The novel visuotactile sensors (DuoTact) are developed to produce clearer tactile signals across diverse contact scenarios and better contact support.
2. We replace the SLAM-based tracking with the Meta Quest 3, providing accurate, real-time 6-DoF poses for both handheld devices.
3. The mechanical structure is orignally designed for improved ergonomics and reduced weight by removing onboard computing (e.g., Raspberry Pi) and interfacing all sensors directly with the host computer.
4. All sensing modalities are latency-calibrated and synchronized to ensure precise spatiotemporal alignment.
(1) A PVC film is inserted into the mold cavity and coated with a transparent silicone adhesive. Subsequently, 10g of transparent silicone gel (Wacker Elastosil® RT 601, A:B = 9:1 by weight) is poured into the mold and cured at 60°C for 30 minutes.
(2) A reflective coating, consisting of Posilicone Translucent silicone (A:B = 1:1 by weight) and white pigment (2:0.1 weight ratio), is applied onto the cured transparent silicone surface. This layer is then cured at 60°C for 20 minutes.
(3) A black coating, consisting of Novocs Matte matting agent, Ecoflex 00-10 (A:B = 1:1 by weight), and black pigment (26:6:1 weight ratio), is uniformly airbrushed over the reflective layer and dried at 60°C for 20 minutes.
(4) For final assembly, the two fabricated contact layers are slid into slots on the TPU frame. A black rubber sheet is then attached to the frame's exterior via heat sealing. Subsequently, the LED strip light is threaded through designated slots on the frame. Finally, the RGB camera and the TPU frame assembly are mounted onto the finger bracket using screws and nuts.
Diagram of the fabrication process for DuoTact
Principle and result diagram of point cloud recon- struction.
@article{li2025vitamin,
title={ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface},
author={Li, Chuanyu and Liu, Chaoyi and Wang, Daotan and Zhang, Shuyu and Li, Lusong and Zeng, Zecui and Liu, Fangchen and Xu, Jing and Chen, Rui},
journal={arXiv preprint arXiv:2511.05858},
year={2025}
}