130k Lines of Formal Topology: Simple and Cheap Autoformalization for Everyone?

2026-03-0323:173411arxiv.org

This is a brief description of a project that has already autoformalized a large portion of the general topology from the Munkres textbook (which has in total 241 pages in 7 chapters and 39 sections).…

Show article

View PDF

Abstract:This is a brief description of a project that has already autoformalized a large portion of the general topology from the Munkres textbook (which has in total 241 pages in 7 chapters and 39 sections). The project has been running since November 21, 2025 and has as of January 4, 2026, produced 160k lines of formalized topology. Most of it (about 130k lines) have been done in two weeks,from December 22 to January 4, for an LLM subscription cost of about \$100. This includes a 3k-line proof of Urysohn's lemma, a 2k-line proof of Urysohn's Metrization theorem, over 10k-line proof of the Tietze extension theorem, and many more (in total over 1.5k lemmas/theorems). The approach is quite simple and cheap: build a long-running feedback loop between an LLM and a reasonably fast proof checker equipped with a core foundational library. The LLM is now instantiated as ChatGPT (mostly 5.2) or Claude Sonnet (4.5) run through the respective Codex or Claude Code command line interfaces. The proof checker is Chad Brown's higher-order set theory system Megalodon, and the core library is Brown's formalization of basic set theory and surreal numbers (including reals, etc). The rest is some prompt engineering and technical choices which we describe here. Based on the fast progress, low cost, virtually unknown ITP/library, and the simple setup available to everyone, we believe that (auto)formalization may become quite easy and ubiquitous in 2026, regardless of which proof assistant is used.

From: Josef Urban [view email]
[v1] Tue, 6 Jan 2026 01:01:04 UTC (114 KB)

Read the original article

Comments

By bmenrigh 2026-03-043:312 reply

My main skepticism here is whether the theorems have been properly replicated in the proof. Verifying that the proof really captured the mathematical statement seems like a manual, human process, and quite hard to repeat reliably across proofs.

By jaen 2026-03-0413:03

At least for the main accomplishment, the Tietze extension theorem, the natural language statement and the Megalodon theorem looked quite easy to correspond, and the dependency chain seemed fairly easy as well. (the fact that the proof is 10k lines is irrelevant, this is just about the theorem statement itself)

This project did not deal with some cutting-edge, esoteric or puzzle-type maths for which the formalization can be tricky - just well-trodden topology.

By akoboldfrying 2026-03-044:061 reply

Maybe this is what you meant, but the dangerous part is ensuring that the final claim being proved is correct -- the actual proof of that claim is, by design, something that can be quickly and mechanically verified by applying a series of simple rules to a set of accepted axioms. Even a human could (somewhat laboriously) do this verification.

I have no experience with this, but I'd expect the danger to arise from implicit assumptions that are obvious to a mathematician in the relevant subfield but not necessarily to an LLM. E.g., whether we are dealing with real-valued or integer-valued variables can make a big difference in many problems, but might only be actually explicitly stated once in the very first chapter (or even implied by the book's title).

There are also many types of "overloaded" mathematical notation -- usually a superscripted number means "raised to the power", but if that number is -1, it might instead mean "the inverse of this function".

By zozbot234 2026-03-044:59

This is correct, with the proviso that the meaning of the claim/statement depends on the definitions and checking these can be a bit tedious. But still feasible, and better by orders of magnitude than "checking" the proofs. Abuses of notation and just plain sloppiness are a big issue though if you wish to auto-translate an informal development into correct definitions and statements for formalization.

By esafak 2026-03-042:411 reply

Are there any lessons or tricks here that we can apply to software engineering? What kinds of assertions can we express more easily with Lean etc. than with unit tests, fuzz tests, property tests, using our existing testing frameworks?

By xiphias2 2026-03-043:17

The main lesson is quite simple: if you can write the test to be uncheatable, ChatGPT can write the code for it.

By QuesnayJr 2026-03-047:06

What's most impressive about it is that it uses a system, Megaladon, which is pretty obscure. There isn't that much documentation or examples, but it can still formalize an advanced undergraduate textbook.

Hacker News