Multi-turn GUI agents enable complex task completion through sequential decision-making, but suffer from severe context inflation as interaction history accumulates. Existing strategies either sacrifice long-term context via truncation or compromise spatial structure through token pruning. In this paper, we propose Coordinate Compression Policy Optimization (CCPO), an efficient policy optimization framework that couples visual compression with policy optimization for multi-turn GUI agents. CCPO introduces Coordinate-Aware Spatial Compression (CASC), which aggregates coordinates from multiple rollouts to capture target-relevant regions and progressively narrow historical attention around key visual areas. From interactions across rollouts, CASC adaptively constructs attention boundaries that concentrate computation on the most informative regions of the scene. We further design a Distance-Based Advantage that provides fine-grained learning signals based on distance rather than binary correctness, improving both grounding accuracy and compression quality. Extensive experiments demonstrate that CCPO achieves SOTA performance across four benchmarks with up to 55% token compression and 3.8 × training speedup.
| Model | History Length |
Token Length ↓ |
Compression Ratio ↑ |
Training Time (s/step) |
|---|---|---|---|---|
| SO-RL-3B | 1AO | 6998 | 0.0% | 515 |
| 3AO | 9888 | 0.0% | 660 | |
| CCPO-3B | 1AO | 4271 | 38.9% | 154 (3.3×) |
| 3AO | 4460 | 54.9% | 174 (3.8×) | |
| SO-RL-7B | 1AO | 7026 | 0.0% | 569 |
| 3AO | 9550 | 0.0% | 717 | |
| CCPO-7B | 1AO | 4262 | 39.3% | 186 (3.1×) |
| 3AO | 4473 | 53.2% | 204 (3.5×) |
Table 1. The training efficiency comparison between CCPO and Semi-Online RL on Android Control dataset.
| Method | Compute Load (TFLOPS) ↓ |
Token Latency (ms) ↓ |
Step Latency (s) ↓ |
|---|---|---|---|
| SO-RL | 9.6 | 0.064 | 297.1 |
| CCPO | 5.4 (-44%) | 0.057 (-10%) | 194.5 (-35%) |
Table 2. Training efficiency comparison in terms of compute load and latency.
| Model | History Format AOT |
Android Control High | GUI Odyssey | ||||
|---|---|---|---|---|---|---|---|
| TM | GR | SR | TM | GR | SR | ||
| Open-source Models | |||||||
| OS-Atlas-4B ZS | A | 49.0 | 49.5 | 22.8 | 49.6 | 34.6 | 20.3 |
| OS-Atlas-4B FT | A | 84.7 | 73.8 | 67.5 | 83.5 | 61.4 | 56.4 |
| Qwen2.5VL-3B | A | 47.8 | 46.5 | 38.9 | 37.4 | 26.5 | 26.7 |
| UI-R1-3B | -- | 57.9 | 55.7 | 45.4 | 52.2 | 34.5 | 32.5 |
| GUI-R1-3B | A | 58.0 | 56.2 | 46.6 | 54.8 | 41.5 | 41.3 |
| OS-Genesis-7B | AO | 65.9 | - | 44.4 | 11.7 | - | 3.6 |
| Aguvis-7B | A | 65.6 | - | 54.2 | 26.7 | - | 13.5 |
| GUI-R1-7B | A | 71.6 | 65.6 | 51.7 | 65.5 | 43.6 | 38.8 |
| AgentCPM-GUI-8B | A | 77.7 | - | 69.2 | 90.8 | - | 75.0 |
| OS-Atlas-7B ZS | A | 57.4 | 54.9 | 29.8 | 60.4 | 39.7 | 27.0 |
| OS-Atlas-7B FT | A | 85.2 | 78.5 | 71.2 | 84.5 | 67.8 | 62.0 |
| UI-TARS-7B | AOT | 83.7 | 80.5 | 72.5 | 94.6 | 90.1 | 87.0 |
| UI-S1-7B | AOT | 79.9 | 73.4 | 68.2 | 76.3 | 61.7 | 59.5 |
| Our Models | |||||||
| Qwen2.5VL-3B (0-shot) | AO | 24.9 | 68.3 | 20.2 | 27.8 | 46.4 | 14.7 |
| w/ SFT | AO | 85.2 | 73.5 | 68.6 | 88.0 | 84.3 | 75.9 |
| w/ Semi-online RL | AO | 83.7 | 74.8 | 67.5 | 82.6 | 81.3 | 71.3 |
| CCPO-3B-1AO | AO | 85.3 | 76.7 | 70.6 | 91.7 | 87.2 | 81.1 |
| CCPO-3B-3AO | AO | 85.7 | 77.5 | 70.8 | 90.6 | 88.5 | 80.9 |
| Qwen2.5VL-7B (0-shot) | AO | 58.9 | 70.3 | 44.1 | 55.8 | 50.8 | 31.8 |
| w/ SFT | AO | 85.9 | 75.9 | 70.6 | 88.0 | 84.6 | 76.0 |
| w/ Semi-online RL | AO | 86.3 | 76.7 | 70.6 | 89.2 | 84.9 | 76.7 |
| CCPO-7B-1AO | AO | 86.4 | 78.8 | 72.2 | 91.1 | 87.2 | 80.3 |
| CCPO-7B-3AO | AO | 86.9 | 79.7 | 73.3 | 91.8 | 89.3 | 82.4 |
Table 3. Results of our CCPO model on the Android Control and GUI-Odyssey navigation tasks. In the History format, AOT means the model includes Action, Observation, and Thought history, respectively.
| Method | Param | Mind2Web | AITW | |||
|---|---|---|---|---|---|---|
| Cross-Task | Cross-Website | Cross-Domain | Overall | ClickAvg | ||
| Qwen-VL 9.6B | 9.6B | 13.3 | 9.2 | 12.0 | 54.3 | 57.4 |
| SeeClick | 9.6B | 25.5 | 16.4 | 20.8 | 59.3 | 66.4 |
| R-VLM | 9.6B | 28.7 | 26.1 | 24.3 | 64.9 | 71.0 |
| Iris | 9.6B | 32.0 | 26.2 | 28.8 | 63.6 | 71.0 |
| Qwen2-VL | 2B | 46.7 | 42.2 | 44.6 | 57.7 | -- |
| ShowUI-2B | 2B | 37.2 | 35.1 | 35.2 | 70.0 | -- |
| SimpAgent | 2B | 48.7 | 42.2 | 45.0 | 71.5 | -- |
| TongUI-3B | 2B | 48.8 | 48.1 | 49.5 | 71.6 | -- |
| TongUI-7B | 7B | 53.4 | 49.0 | 52.9 | 73.3 | -- |
| Qwen2.5-VL-3B w/ SFT | 3B | 52.0 | 46.5 | 48.7 | 70.8 | 78.4 |
| CCPO-3B 1AO | 3B | 54.6 | 50.6 | 50.6 | 71.8 | 79.7 |
| CCPO-3B 3AO | 3B | 56.5 | 51.0 | 51.8 | 73.1 | 80.4 |
| Qwen2.5-VL-7B w/ SFT | 7B | 55.6 | 51.3 | 52.0 | 72.3 | 80.2 |
| CCPO-7B-1AO | 7B | 58.0 | 53.4 | 55.7 | 73.5 | 81.0 |
| CCPO-7B-3AO | 7B | 59.5 | 53.6 | 56.5 | 74.4 | 81.4 |
Table 4. Results of CCPO on the Mind2Web and AITW benchmarks across different settings.
| Method | General | Single | Web Shopping | Install | Google Apps | Overall | ClickAvg |
|---|---|---|---|---|---|---|---|
| Qwen-VL 9.6B | 49.5 | 64.7 | 50.7 | 59.9 | 46.9 | 54.3 | 57.4 |
| SeeClick | 54.0 | 73.7 | 57.6 | 66.4 | 54.9 | 59.3 | 66.4 |
| R-VLM | 59.9 | 72.5 | 61.7 | 70.6 | 59.6 | 64.9 | 71.0 |
| Qwen2-VL | 48.3 | 57.8 | 51.6 | 77.4 | 52.9 | 57.7 | -- |
| Iris | 61.5 | 71.4 | 58.3 | 66.4 | 60.2 | 63.6 | 71.0 |
| ShowUI-2B | 63.9 | 77.5 | 66.6 | 72.5 | 69.7 | 70.0 | -- |
| SimpAgent | 64.1 | 76.2 | 67.2 | 75.8 | 74.0 | 71.5 | -- |
| TongUI-3B | 65.6 | 77.0 | 65.8 | 75.1 | 74.5 | 71.6 | -- |
| TongUI-7B | 67.6 | 79.9 | 69.1 | 76.3 | 73.5 | 73.3 | -- |
| Qwen2.5-VL-3B w/ SFT | 61.5 | 75.4 | 67.2 | 75.8 | 74.1 | 70.8 | 78.4 |
| CCPO-3B 1AO w/o CR | 62.7 | 78.2 | 65.1 | 75.5 | 76.4 | 71.6 | 79.1 |
| CCPO-3B 1AO | 64.3 | 76.1 | 67.2 | 76.1 | 75.4 | 71.8 | 79.7 |
| CCPO-3B 3AO w/o CR | 65.2 | 79.2 | 66.6 | 76.5 | 75.8 | 72.7 | 80.0 |
| CCPO-3B 3AO | 65.3 | 77.5 | 68.3 | 78.3 | 76.0 | 73.1 | 80.4 |
| Qwen2.5-VL-7B w/ SFT | 64.8 | 77.5 | 68.5 | 76.9 | 73.9 | 72.3 | 80.2 |
| CCPO-7B 1AO w/o CR | 66.4 | 79.4 | 67.5 | 75.9 | 76.2 | 73.1 | 79.3 |
| CCPO-7B-1AO | 67.0 | 78.2 | 68.7 | 77.3 | 76.2 | 73.5 | 81.0 |
| CCPO-7B 3AO w/o CR | 64.9 | 79.4 | 70.0 | 77.3 | 79.0 | 74.1 | 80.5 |
| CCPO-7B-3AO | 68.3 | 78.7 | 69.6 | 77.3 | 78.0 | 74.4 | 81.4 |
Table 5. Results of CCPO-MAX on the AITW benchmark.
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
SFT Baseline
CCPO (Ours)
GUI Odyssey
AITW
Android Control
| Model | AO | TM | GR | SR |
|---|---|---|---|---|
| Qwen2.5-VL-7B | 1AO | 83.75 | 74.95 | 67.97 |
| 2AO | 85.30 | 75.95 | 70.00 | |
| 3AO | 85.94 | 75.95 | 70.60 | |
| 4AO | 84.89 | 75.77 | 69.65 | |
| CCPO-7B | 1AO | 86.45 | 78.80 | 72.18 |
| 2AO | 86.86 | 79.48 | 73.19 | |
| 3AO | 86.89 | 79.71 | 73.25 | |
| 4AO | 86.27 | 80.20 | 73.11 |
Table 6. Performance comparison for AC datasets from 1AO to 4AO.
| Method | AC-TM | AC-GR | AC-SR |
|---|---|---|---|
| Qwen2.5VL-7B SFT | 85.94 | 75.95 | 70.60 |
| + Semi-online | 86.27 (+0.33) | 77.93 (+1.98) | 72.35 (+1.75) |
| + CASC | 86.72 (+0.78) | 79.12 (+3.17) | 72.70 (+2.1) |
| + CASC + CR | 86.89 (+0.95) | 79.71 (+3.76) | 73.25 (+2.65) |
Table 7. Ablation study of different components on the Android Control dataset.
@article{Anonymous2026compress2focus,
author = {Anonymous},
title = {Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents},
journal = {xxxx},
year = {2026},
}