A lightweight predictive latent world model for real-time latency mitigation in vision-based teleoperation.
Problem. Vision-based teleoperation suffers from stale visual feedback under communication latency. TeleopWM predicts short-horizon future observations and action trends to support latency-resilient predictive display.
The main contributions of this work are summarized as follows:
TeleopWM maintains strong steering correlation over the 8-step prediction horizon while longitudinal error increases gradually with horizon length.
| Category | Metric | Value |
|---|---|---|
| Rollout prediction | Horizon | 8 frames / approximately 533 ms at 15 FPS |
| Future action prediction | Outputs | longitudinal and steering trends |
| Runtime | Inference latency | 38.9 ms / rollout |
| Runtime | Prediction rate | 205.5 FPS |
| Runtime | Peak VRAM | 1.24 GB |
| Resolution | Input/output | 320x512 |
Runtime values are reference measurements from the final paper configuration and should be re-measured on target hardware.
Public release artifacts are hosted through GitHub and Hugging Face.
@misc{teleopwm2026,
title={TeleopWM: A Real-Time Predictive World Model for Latency-Resilient Vision-Based Teleoperation},
author={Khalil, Aws and Kwon, Jaerock},
year={2026},
note={Under review}
}