Compress2Focus: Project Page

Figure 1. Existing multi-turn methods tend to truncate the visual history due to the limited context length. The proposed CCPO method preserves the key visual history to maintain the longer trajectory visibility.

Abstract

Multi-turn GUI agents enable complex task completion through sequential decision-making, but suffer from severe context inflation as interaction history accumulates. Existing strategies either sacrifice long-term context via truncation or compromise spatial structure through token pruning. In this paper, we propose Coordinate Compression Policy Optimization (CCPO), an efficient policy optimization framework that couples visual compression with policy optimization for multi-turn GUI agents. CCPO introduces Coordinate-Aware Spatial Compression (CASC), which aggregates coordinates from multiple rollouts to capture target-relevant regions and progressively narrow historical attention around key visual areas. From interactions across rollouts, CASC adaptively constructs attention boundaries that concentrate computation on the most informative regions of the scene. We further design a Distance-Based Advantage that provides fine-grained learning signals based on distance rather than binary correctness, improving both grounding accuracy and compression quality. Extensive experiments demonstrate that CCPO achieves SOTA performance across four benchmarks with up to 55% token compression and 3.8 × training speedup.

Methodology

Figure 2. Overview of CCPO framework. The training phase (top) optimizes policies via multi-turn rollouts evaluated by the Distance-Aware Advantage. The Coordinate-Aware Spatial Compression module (bottom) tracks n actions and aggregates coordinates to predict ROI of each step, then crop the task-relevant region as a focused visual history h_t+1.

Experimental Results

Training Efficiency

Model	History Length	Token Length ↓	Compression Ratio ↑	Training Time (s/step)
SO-RL-3B	1AO	6998	0.0%	515
SO-RL-3B	3AO	9888	0.0%	660
CCPO-3B	1AO	4271	38.9%	154 (3.3×)
CCPO-3B	3AO	4460	54.9%	174 (3.8×)
SO-RL-7B	1AO	7026	0.0%	569
SO-RL-7B	3AO	9550	0.0%	717
CCPO-7B	1AO	4262	39.3%	186 (3.1×)
CCPO-7B	3AO	4473	53.2%	204 (3.5×)

Table 1. The training efficiency comparison between CCPO and Semi-Online RL on Android Control dataset.

Method	Compute Load (TFLOPS) ↓	Token Latency (ms) ↓	Step Latency (s) ↓
SO-RL	9.6	0.064	297.1
CCPO	5.4 (-44%)	0.057 (-10%)	194.5 (-35%)

Table 2. Training efficiency comparison in terms of compute load and latency.

Results on Android Control and GUI Odyssey datasets

Model	History Format AOT	Android Control High			GUI Odyssey
Model	History Format AOT	TM	GR	SR	TM	GR	SR
Open-source Models
OS-Atlas-4B ZS	A	49.0	49.5	22.8	49.6	34.6	20.3
OS-Atlas-4B FT	A	84.7	73.8	67.5	83.5	61.4	56.4
Qwen2.5VL-3B	A	47.8	46.5	38.9	37.4	26.5	26.7
UI-R1-3B	--	57.9	55.7	45.4	52.2	34.5	32.5
GUI-R1-3B	A	58.0	56.2	46.6	54.8	41.5	41.3
OS-Genesis-7B	AO	65.9	-	44.4	11.7	-	3.6
Aguvis-7B	A	65.6	-	54.2	26.7	-	13.5
GUI-R1-7B	A	71.6	65.6	51.7	65.5	43.6	38.8
AgentCPM-GUI-8B	A	77.7	-	69.2	90.8	-	75.0
OS-Atlas-7B ZS	A	57.4	54.9	29.8	60.4	39.7	27.0
OS-Atlas-7B FT	A	85.2	78.5	71.2	84.5	67.8	62.0
UI-TARS-7B	AOT	83.7	80.5	72.5	94.6	90.1	87.0
UI-S1-7B	AOT	79.9	73.4	68.2	76.3	61.7	59.5
Our Models
Qwen2.5VL-3B (0-shot)	AO	24.9	68.3	20.2	27.8	46.4	14.7
w/ SFT	AO	85.2	73.5	68.6	88.0	84.3	75.9
w/ Semi-online RL	AO	83.7	74.8	67.5	82.6	81.3	71.3
CCPO-3B-1AO	AO	85.3	76.7	70.6	91.7	87.2	81.1
CCPO-3B-3AO	AO	85.7	77.5	70.8	90.6	88.5	80.9
Qwen2.5VL-7B (0-shot)	AO	58.9	70.3	44.1	55.8	50.8	31.8
w/ SFT	AO	85.9	75.9	70.6	88.0	84.6	76.0
w/ Semi-online RL	AO	86.3	76.7	70.6	89.2	84.9	76.7
CCPO-7B-1AO	AO	86.4	78.8	72.2	91.1	87.2	80.3
CCPO-7B-3AO	AO	86.9	79.7	73.3	91.8	89.3	82.4

Table 3. Results of our CCPO model on the Android Control and GUI-Odyssey navigation tasks. In the History format, AOT means the model includes Action, Observation, and Thought history, respectively.

Results on Mind2Web and AITW

Method	Param	Mind2Web			AITW
Method	Param	Cross-Task	Cross-Website	Cross-Domain	Overall	ClickAvg
Qwen-VL 9.6B	9.6B	13.3	9.2	12.0	54.3	57.4
SeeClick	9.6B	25.5	16.4	20.8	59.3	66.4
R-VLM	9.6B	28.7	26.1	24.3	64.9	71.0
Iris	9.6B	32.0	26.2	28.8	63.6	71.0
Qwen2-VL	2B	46.7	42.2	44.6	57.7	--
ShowUI-2B	2B	37.2	35.1	35.2	70.0	--
SimpAgent	2B	48.7	42.2	45.0	71.5	--
TongUI-3B	2B	48.8	48.1	49.5	71.6	--
TongUI-7B	7B	53.4	49.0	52.9	73.3	--
Qwen2.5-VL-3B w/ SFT	3B	52.0	46.5	48.7	70.8	78.4
CCPO-3B 1AO	3B	54.6	50.6	50.6	71.8	79.7
CCPO-3B 3AO	3B	56.5	51.0	51.8	73.1	80.4
Qwen2.5-VL-7B w/ SFT	7B	55.6	51.3	52.0	72.3	80.2
CCPO-7B-1AO	7B	58.0	53.4	55.7	73.5	81.0
CCPO-7B-3AO	7B	59.5	53.6	56.5	74.4	81.4

Table 4. Results of CCPO on the Mind2Web and AITW benchmarks across different settings.

Results on AITW Benchmark

Method	General	Single	Web Shopping	Install	Google Apps	Overall	ClickAvg
Qwen-VL 9.6B	49.5	64.7	50.7	59.9	46.9	54.3	57.4
SeeClick	54.0	73.7	57.6	66.4	54.9	59.3	66.4
R-VLM	59.9	72.5	61.7	70.6	59.6	64.9	71.0
Qwen2-VL	48.3	57.8	51.6	77.4	52.9	57.7	--
Iris	61.5	71.4	58.3	66.4	60.2	63.6	71.0
ShowUI-2B	63.9	77.5	66.6	72.5	69.7	70.0	--
SimpAgent	64.1	76.2	67.2	75.8	74.0	71.5	--
TongUI-3B	65.6	77.0	65.8	75.1	74.5	71.6	--
TongUI-7B	67.6	79.9	69.1	76.3	73.5	73.3	--
Qwen2.5-VL-3B w/ SFT	61.5	75.4	67.2	75.8	74.1	70.8	78.4
CCPO-3B 1AO w/o CR	62.7	78.2	65.1	75.5	76.4	71.6	79.1
CCPO-3B 1AO	64.3	76.1	67.2	76.1	75.4	71.8	79.7
CCPO-3B 3AO w/o CR	65.2	79.2	66.6	76.5	75.8	72.7	80.0
CCPO-3B 3AO	65.3	77.5	68.3	78.3	76.0	73.1	80.4
Qwen2.5-VL-7B w/ SFT	64.8	77.5	68.5	76.9	73.9	72.3	80.2
CCPO-7B 1AO w/o CR	66.4	79.4	67.5	75.9	76.2	73.1	79.3
CCPO-7B-1AO	67.0	78.2	68.7	77.3	76.2	73.5	81.0
CCPO-7B 3AO w/o CR	64.9	79.4	70.0	77.3	79.0	74.1	80.5
CCPO-7B-3AO	68.3	78.7	69.6	77.3	78.0	74.4	81.4

Table 5. Results of CCPO-MAX on the AITW benchmark.

CCPO (Ours)

Analysis

Coordinate-Based Actions Distribution for Three Datasets

GUI Odyssey

AITW

Android Control

Performance Comparison for AC Datasets

Model	AO	TM	GR	SR
Qwen2.5-VL-7B	1AO	83.75	74.95	67.97
	2AO	85.30	75.95	70.00
	3AO	85.94	75.95	70.60
	4AO	84.89	75.77	69.65
CCPO-7B	1AO	86.45	78.80	72.18
	2AO	86.86	79.48	73.19
	3AO	86.89	79.71	73.25
	4AO	86.27	80.20	73.11

Table 6. Performance comparison for AC datasets from 1AO to 4AO.

Ablation Study

Method	AC-TM	AC-GR	AC-SR
Qwen2.5VL-7B SFT	85.94	75.95	70.60
+ Semi-online	86.27 (+0.33)	77.93 (+1.98)	72.35 (+1.75)
+ CASC	86.72 (+0.78)	79.12 (+3.17)	72.70 (+2.1)
+ CASC + CR	86.89 (+0.95)	79.71 (+3.76)	73.25 (+2.65)

Table 7. Ablation study of different components on the Android Control dataset.

BibTeX

@article{Anonymous2026compress2focus,
  author    = {Anonymous},
  title     = {Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents},
  journal   = {xxxx},
  year      = {2026},
}

Compress2Focus Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents

Abstract

Methodology

Experimental Results

Training Efficiency

Results on Android Control and GUI Odyssey datasets

Results on Mind2Web and AITW

Results on AITW Benchmark

Qualitative Comparison

Case 1

Case 2

Case 3

Case 4

Case 5

Case 6

Case 7

Case 8

Failure Case 1

Failure Case 2

Analysis

Coordinate-Based Actions Distribution for Three Datasets

Performance Comparison for AC Datasets

Ablation Study

BibTeX

Compress2Focus
Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents