cs.AI

AI that proves theorems from a "blueprint" — "Goedel-Architect" for formal theorem proving in Lean 4

cs.AI ・Jui-Hui Chung, Ziyang Cai, Zihao Li, et al. (17) ・Jun 2026

An agentic framework for formal theorem proving in Lean 4 that generates and refines a "blueprint" — a dependency graph of definitions and lemmas. A tool-equipped Lean prover closes each lemma node in parallel, and failures drive blueprint refinement, avoiding the dead-end loops of recursive decomposition. On an open-weight backbone it reaches 99.2% on MiniF2F and 75.6% on PutnamBench (88.8% with a natural-language proof) — SOTA-class for an open-source pipeline.

Paper overview (our summary)

Field (arXiv category)cs.AI
AuthorsJui-Hui Chung, Ziyang Cai, Zihao Li, et al. (17)
Submitted2026-06-04
arXiv ID2606.06468v1

Key points

An agentic Lean 4 theorem-proving framework built on blueprint generation/refinement
Builds a dependency graph of definitions/lemmas; closes lemma nodes in parallel; failures refine the blueprint
Avoids the dead-end loops of recursive decomposition
Open-weight backbone: 99.2% MiniF2F, 75.6% PutnamBench (88.8% with a natural-language proof)
SOTA for an open-source pipeline at up to 500x lower cost

This work (Goedel-Architect) has AI plan a "blueprint" before formally proving a theorem.

Goedel-Architect is an agentic framework for formal theorem proving in the Lean 4 proof assistant, centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem.

First, it generates a blueprint of formally stated definitions and lemmas, along with declared dependencies (optionally guided by a natural-language proof). Then a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas, in turn, drive refinement of the global blueprint. This contrasts with mainstream approaches that use recursive lemma decomposition and can inefficiently loop on dead-end strategies.

Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on harder problems, it closes the remaining two MiniF2F-test problems (100%), lifts PutnamBench to 88.8% (597/672), and solves 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026 — state-of-the-art for an open-source pipeline at up to 500x lower cost than comparable ones.

Why it matters

A case of rapid progress in AI-driven formal mathematics / automated theorem proving. Useful for tracking AI reasoning, applications to math and software verification, and the strength of open-source models.

FAQ

What is formal theorem proving?

Writing mathematical proofs in a machine-checkable form (e.g., in Lean). Automating it with AI has applications in mathematics and software verification.

Why a "blueprint" approach?

It first lays out the overall plan (definitions/lemmas and their dependencies) and proves each lemma in parallel, refining only the stuck parts — avoiding wasteful loops.

Sources (primary)

Source: arXiv (descriptive metadata is CC0 public domain). Summaries are our own; see arXiv for the original text and PDF.

arXiv abstract page (original, official)
PDF (arXiv)
arXiv ID: 2606.06468

#AI#arXiv#Research paper#Theorem proving#Lean#AI reasoning

← Back to AI research paper watch