A full-stack testbed for controlled latency injection and systematic route-based evaluation of vision-based teleoperation and remote autonomy.
Problem. Vision-based teleoperation and remote autonomy can appear robust at low delay, yet exhibit sharp, nonlinear collapse when network latency increases. Measuring this transition reliably requires a full-stack system with controlled latency injection, accurate time synchronization, and standardized route-based evaluation.
Contributions (threefold).
LAVT is a distributed ROS 2 framework that combines (i) video streaming via GStreamer, (ii) time synchronization via Chrony, (iii) reproducible delay injection using Linux NetEm for both video and control channels, and (iv) a client-side autonomy/teleoperation interface supporting repeatable route executions in CARLA.
LAVT is designed to operate on both simulated and real vehicle platforms. The system has been integrated with a full-scale drive-by-wire (DBW) research vehicle equipped with steering, throttle, and brake-by-wire control modules. While the complete teleoperation and streaming stack runs on this vehicle, all experiments in this study were conducted in the CARLA simulator to ensure safety, repeatability, and isolation of latency effects.
We evaluate Town04 using three routes (A–C) and corresponding “key” subsets that isolate steering-intensive segments where latency-induced instability emerges first.
LAVT supports independent latency control on the video stream and control commands, enabling structured experiments that isolate performance sensitivity to network delay.
At low delay (L0–L1), the system maintains stable lane keeping with small tracking error and near-perfect completion rates. As perception latency increases to approximately 150–225 ms (conditions L2–L3), the controller begins to operate on increasingly stale visual observations, introducing phase lag between perception and actuation. This produces oscillatory steering corrections and growing lateral deviation, leading to a sharp drop in route completion. Beyond this transition region, the system exhibits nonlinear degradation: completion rates collapse rapidly while tracking error among surviving runs increases substantially. Additional control-channel delay (L4–L5) further accelerates this instability by delaying corrective commands, reducing the system's ability to recover from lane deviations.
| Condition | Completion (%) | Collision Rate | Lane Invasion (mean) | P95 Cross-Track (m) |
|---|---|---|---|---|
| L0 | 100.0 | 0.20 | 0.40 | 2.30 |
| L1 | 93.3 | 0.23 | 3.97 | 2.73 |
| L2 | 50.0 | 0.67 | 15.10 | 3.91 |
| L3 | 36.7 | 0.83 | 22.13 | 9.25 |
| L4 | 30.0 | 0.77 | 18.80 | 4.64 |
| L5 | 10.0 | 0.97 | 16.83 | 13.15 |
The table summarizes aggregate driving performance across all routes for each latency condition (completion rate, collision frequency, lane-invasion events, and P95 cross-track error).
Recommended order: SETUP · QUICKSTART · EXPERIMENTS · TROUBLESHOOTING
Use the following BibTeX entry (fields will be updated once the paper is public):
@misc{lavt2026,
title = {Nonlinear Performance Degradation of Vision-Based Teleoperation under Network Latency},
author = {Khalil, Aws and Kwon, Jaerock},
year = {2026},
note = {Under review},
howpublished = {\url{https://github.com/bimilab/paper-LAVT}}
}