TeleopWM project page STATUS: under review

TeleopWM: A Real-Time Predictive World Model for Latency-Resilient Vision-Based Teleoperation

A lightweight predictive latent world model for real-time latency mitigation in vision-based teleoperation.

Aws Khalil · Jaerock Kwon
Bio-Inspired Machine Intelligence (BIMI) Lab, University of Michigan – Dearborn

Demo Video

Research Overview

Problem. Vision-based teleoperation suffers from stale visual feedback under communication latency. TeleopWM predicts short-horizon future observations and action trends to support latency-resilient predictive display.

The main contributions of this work are summarized as follows:

  • We propose TeleopWM, a lightweight predictive latent framework for latency-resilient vision-based teleoperation that jointly supports predictive display and future action forecasting.
  • We introduce a motion-aware future action prediction strategy that estimates future driving behavior from latent motion dynamics rather than static latent appearance representations.
  • We demonstrate that TeleopWM maintains lightweight real-time inference characteristics while producing stable predictive visual rollouts and multi-step future action forecasts under teleoperation-oriented constraints.

Method: TeleopWM Predictive World Model

TeleopWM method overview
TeleopWM uses recent visual observations and control inputs to predict future visual feedback and action trends for latency-resilient teleoperation.

Qualitative Rollout Results

TeleopWM qualitative rollout results
Representative 8-step future RGB rollouts and action alignment across straight, mild-turn, sharp-turn, and intersection scenarios.

Future Action Prediction

Future action prediction metrics
Per-step future action error and correlation for longitudinal and steering predictions.

TeleopWM maintains strong steering correlation over the 8-step prediction horizon while longitudinal error increases gradually with horizon length.

Runtime and Key Metrics

Category Metric Value
Rollout prediction Horizon 8 frames / approximately 533 ms at 15 FPS
Future action prediction Outputs longitudinal and steering trends
Runtime Inference latency 38.9 ms / rollout
Runtime Prediction rate 205.5 FPS
Runtime Peak VRAM 1.24 GB
Resolution Input/output 320x512

Runtime values are reference measurements from the final paper configuration and should be re-measured on target hardware.

Code, Data, and Checkpoints

Public release artifacts are hosted through GitHub and Hugging Face.

Citation

@misc{teleopwm2026,
  title={TeleopWM: A Real-Time Predictive World Model for Latency-Resilient Vision-Based Teleoperation},
  author={Khalil, Aws and Kwon, Jaerock},
  year={2026},
  note={Under review}
}