Visual Artifacts
pixel approximation;
no engine-exact certificate.
juncheng.hu@u.nus.edu
·
{dujw, zhangx7, Joey_Zhou}@a-star.edu.sg
pixel approximation;
no engine-exact certificate.
intermediate text lacks
exact validation
geometry checked after generation;
no per-action engine feedback
Prior routes externalize intermediate geometry as visual artifacts, textual traces, or executable scripts — each surfaces a state while leaving geometric validity uncertified during construction. Draw2Think adds a fourth route: a constraint-agentic harness where a frozen VLM selects typed ToolSpecs, the GeoGebra engine updates an engine-valid canvas state, and structured observations return after each action. The distinction is less about externalizing state than about when verification enters the loop.
Draw2Think wraps a frozen VLM around a dynamic-geometry engine via typed ToolSpecs. In dynamic geometry systems such as GeoGebra, construction commands enforce geometric relationships algebraically rather than by coordinate approximation. Each accepted action is therefore engine-checkable, while the model still chooses the construction strategy.
line_through_perpendicular, circle_through_center).
Two properties become separately auditable: Construction Fidelity (model-level: did the canvas realize the intended configuration?) and Measurement Faithfulness (engine-level: are exact values and relations preserved by canvas constraints?).
Five real Draw2Think trajectories from four datasets. Pick a problem below — hover or click any Engine command step; the model response (left), engine output (right), and live canvas (below) all snap to that step in lock-step. For multi-turn problems, the turn tabs at the top of the command column scroll the active turn into view.
geogebra.org. Needs internet on first paint.
If the canvas renders incorrectly, refresh with Ctrl/Cmd + Shift + R.
Constraint interaction separates latent strategy from checked state: the model still explores, while accepted canvas state is already engine-checked.
Visual sketches, text traces, and generated scripts all expose intermediate objects. Draw2Think moves the verification point earlier: each accepted action becomes a checked premise for the next action.
The gain appears when the model needs exact measurements, consistent construction, or a stable state. On easy or memorized routes, building the canvas can impose cost rather than add evidence.
A final answer does not separate perception, construction, measurement, and algebra errors. A canvas audit lets us ask whether the intermediate geometry itself was realized, independently of the final response.
Beyond geometry, Draw2Think shows how a generative model can use a deterministic engine for checks while keeping construction choices under model control.
Query tools turn exact engine state into answerable evidence. When that channel is removed, answers shift toward escape routes: internal reasoning, construction-return shortcuts, or unanchored final responses.
With cacheable input context, marginal cost shifts toward generated reasoning. On high-thinking benchmarks, engine readouts cut thinking tokens by up to 36%, while ToolSpecs turn free-form text into typed calls and structured observations.
The interface is a control surface for tool orchestration. Small descriptions shift tool selection, parameter binding, and readout anchoring because the model chooses among typed operations rather than emitting arbitrary pixels.
GeoGebra can reject invalid constructions and return exact observations. Auxiliary-object selection and stopping decisions remain with the model, so residual failures point to policy-level planning.
Future work could use Draw2Think to study process evidence directly, treating final-answer gains as one metric among canvas audits, planning signals, and reusable trajectory data.
Future harnesses can return dependency graphs, unresolved constraints, and symbolic query opportunities so the model can reason from the construction plan alongside visible objects, measurements, and final pixels.
Geometry isolates the setting: the engine rejects invalid actions and exposes local state. Similar loops may extend to physical-world AI when tasks have typed actions, an executable twin, and cheap local checks before acting in the real system.
A Draw2Think trajectory contains typed dependencies, engine verdicts, and concrete canvas effects. That makes it a denser training signal than a final answer or a free-form explanation.
If Draw2Think (or the live demos on this project page) is useful for your research, please cite:
@article{hu2026draw2think,
title = {Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction},
author = {Hu, Juncheng and Du, Jiawei and Zhang, Xin and Zhou, Joey Tianyi},
journal = {arXiv preprint arXiv:2605.20743},
year = {2026},
url = {https://draw2think.github.io}
}