AI that proves theorems from a "blueprint" — "Goedel-Architect" for formal theorem proving in Lean 4
An agentic framework for formal theorem proving in Lean 4 that generates and refines a "blueprint" — a dependency graph of definitions and lemmas. A tool-equipped Lean prover closes each lemma node in parallel, and failures drive blueprint refinement, avoiding the dead-end loops of recursive decomposition. On an open-weight backbone it reaches 99.2% on MiniF2F and 75.6% on PutnamBench (88.8% with a natural-language proof) — SOTA-class for an open-source pipeline.
Paper overview (our summary)
- Field (arXiv category)cs.AI
- AuthorsJui-Hui Chung, Ziyang Cai, Zihao Li, et al. (17)
- Submitted2026-06-04
- arXiv ID2606.06468v1
Key points
- An agentic Lean 4 theorem-proving framework built on blueprint generation/refinement
- Builds a dependency graph of definitions/lemmas; closes lemma nodes in parallel; failures refine the blueprint
- Avoids the dead-end loops of recursive decomposition
- Open-weight backbone: 99.2% MiniF2F, 75.6% PutnamBench (88.8% with a natural-language proof)
- SOTA for an open-source pipeline at up to 500x lower cost
This work (Goedel-Architect) has AI plan a "blueprint" before formally proving a theorem.
Goedel-Architect is an agentic framework for formal theorem proving in the Lean 4 proof assistant, centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem.
First, it generates a blueprint of formally stated definitions and lemmas, along with declared dependencies (optionally guided by a natural-language proof). Then a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas, in turn, drive refinement of the global blueprint. This contrasts with mainstream approaches that use recursive lemma decomposition and can inefficiently loop on dead-end strategies.
Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on harder problems, it closes the remaining two MiniF2F-test problems (100%), lifts PutnamBench to 88.8% (597/672), and solves 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026 — state-of-the-art for an open-source pipeline at up to 500x lower cost than comparable ones.
Why it matters
A case of rapid progress in AI-driven formal mathematics / automated theorem proving. Useful for tracking AI reasoning, applications to math and software verification, and the strength of open-source models.
FAQ
What is formal theorem proving?
Why a "blueprint" approach?
Sources (primary)
Source: arXiv (descriptive metadata is CC0 public domain). Summaries are our own; see arXiv for the original text and PDF.
- arXiv abstract page (original, official)
- PDF (arXiv)
- arXiv ID: 2606.06468