We propose AutoHorizon, the first test-time method for automatically and dynamically determining the execution horizon for flow-based VLAs
Action chunking has become a standard practice in flow-based Vision-Language-Action (VLA) models. However, the effect and choice of the execution horizon—the number of actions to be executed from each predicted chunk—remains underexplored. In this work, we first show that varying the execution horizon leads to substantial performance deviations, with performance initially improving and then declining as the horizon increases.
To uncover the reasons, we analyze the cross- and self-attention weights in flow-based VLAs and reveal two key phenomena: (i) intra-chunk actions attend invariantly to vision–language tokens, limiting adaptability to environmental changes; and (ii) the initial and terminal action tokens serve as stable anchors, forming latent centers around which intermediate actions are organized.
Motivated by these insights, we interpret action self-attention weights as a proxy for the model's predictive limit and propose AutoHorizon, the first test-time method that dynamically estimates the execution horizon for each predicted action chunk to adapt to changing perceptual conditions. Across simulated and real-world robotic manipulation tasks, AutoHorizon is performant, incurs negligible computational overhead, and generalizes across diverse tasks and flow-based models.
Motivation. Different execution horizons lead to substantial success rate variations, highlighting the importance of choosing an appropriate execution horizon.
Framework overview. AutoHorizon analyzes action self-attention weights to dynamically determine the optimal execution horizon for each predicted action chunk, adapting to changing perceptual conditions.
Methodology. By analyzing the attention weight distribution, we find that cross- and self-attention heatmaps reveal that intra-chunk actions attend invariantly to vision–language tokens, while initial and terminal action tokens serve as stable anchors. Based on these observations, we interpret the action self-attention weights as implicit indicators of the model's predictive limit and propose AutoHorizon, a method that dynamically assigns execution horizons to each predicted action chunk at test-time.
| Setting | p = 10 | p = 50 | ||||||
|---|---|---|---|---|---|---|---|---|
| LIB-Spatial | LIB-Object | LIB-Goal | LIB-10 | LIB-Spatial | LIB-Object | LIB-Goal | LIB-10 | |
| Static Oracle | ||||||||
| e = 0.2p | 98.5±0.5 | 94.1±1.0 | 90.4±0.0 | 76.0±1.2 | 94.9±0.2 | 97.1±0.2 | 92.7±1.0 | 91.2±2.3 |
| e = 0.4p | 98.9±0.4 | 98.1±0.5 | 94.8±0.9 | 82.8±1.5 | 92.4±0.3 | 94.1±1.9 | 90.9±0.7 | 88.7±0.8 |
| e = 0.6p | 98.8±0.3 | 98.7±0.7 | 95.2±0.7 | 85.9±0.8 | 87.1±1.9 | 91.5±2.5 | 86.0±3.3 | 82.4±2.0 |
| e = 0.8p | 98.9±0.4 | 99.1±0.5 | 95.1±1.1 | 88.5±1.5 | 81.2±2.7 | 88.9±1.6 | 78.4±2.0 | 76.3±2.5 |
| e = 1.0p | 99.1±0.5 | 98.8±0.9 | 97.2±1.0 | 90.4±0.6 | 71.5±1.3 | 67.9±0.9 | 74.5±2.7 | 68.6±1.2 |
| Static Oracle+ | 99.1±0.5 | 99.1±0.5 | 97.2±1.0 | 90.4±0.6 | 96.4±0.3 | 97.6±0.6 | 93.9±0.5 | 91.9±0.4 |
| Random | 98.4±1.2 | 98.4±0.7 | 96.3±0.7 | 86.3±1.5 | 82.8±1.5 | 90.3±1.6 | 85.3±2.3 | 83.3±0.7 |
| AutoHorizon | 99.1±0.2 | 99.2±0.3 | 97.5±0.2 | 91.6±0.7 | 96.5±0.9 | 98.0±0.6 | 94.4±1.0 | 92.1±1.0 |
| Setting | Adjust Bottle | Pick Bottles | Place Container | Stack Bowls | Place Cup | Open Laptop | Press Stapler |
|---|---|---|---|---|---|---|---|
| Static Oracle | |||||||
| e = 0.2p | 79.0±1.4 | 40.7±0.5 | 84.0±2.8 | 90.7±1.2 | 83.7±0.9 | 68.3±1.9 | 44.0±0.0 |
| e = 0.4p | 89.0±0.0 | 67.0±0.0 | 83.0±0.0 | 87.3±1.7 | 77.7±2.5 | 78.3±0.5 | 48.0±0.0 |
| e = 0.6p | 89.3±0.9 | 65.0±0.0 | 82.0±0.0 | 86.7±2.5 | 68.3±0.5 | 71.7±2.1 | 67.0±0.0 |
| e = 0.8p | 76.7±1.2 | 58.0±1.4 | 81.0±0.0 | 90.0±2.8 | 66.0±1.6 | 78.7±0.5 | 70.0±0.0 |
| e = 1.0p | 58.7±1.2 | 30.0±0.0 | 76.7±0.5 | 87.7±1.2 | 56.3±3.8 | 84.0±0.0 | 71.0±0.0 |
| Static Oracle+ | 98.7±0.5 | 67.0±0.0 | 91.0±0.8 | 90.7±1.2 | 83.7±0.9 | 84.0±0.0 | 72.0±0.0 |
| Random | 85.3±0.5 | 60.0±0.0 | 86.0±0.0 | 88.7±3.4 | 70.7±2.5 | 82.0±0.0 | 69.0±0.0 |
| AutoHorizon | 100.0±0.0 | 68.0±0.0 | 91.0±0.8 | 92.0±0.8 | 85.3±2.1 | 84.7±0.9 | 75.0±0.0 |
Under stable conditions—such as reaching for or transporting the object—the estimated horizons increase, accelerating execution progress. In contrast, when the robot begins to physically interact with the environment (e.g., grasping or placing the objects), the estimated horizons adaptively shorten, enhancing reactivity to environmental changes.