Lil' Fun Langs

2026-02-2017:3412323taylor.town

a pungent monad odor that attracts mathochists

I adore small programming languages. Iota is two combinators. tinylisp is 99 lines of C. milliForth is 340 bytes. Fractran multiplies fractions. Oh, K?

I've encountered tiny implementations of Forth, Lisp, C, Prolog, etc., but never "milliHaskell".

ML-style languages carry a pungent monad odor that attracts mathochists. Notable examples include Haskell, Elm, F#, Scala, and OCaml. They're "Lambda Calculus with syntactic sugar", i.e. functional and statically-typed. Most implementations extend Hindley-Milner type inference with algebraic data types, pattern matching, and closures:

Feature LOC Dependencies References
Integer arithmetic ~50 Parser, codegen MinCaml
Floating-point ~100 Parser, codegen (SSE/NEON) MinCaml
Booleans + if/then/else ~50 Parser, codegen Everything
Let bindings ~30 Parser, normalization Everything
First-class functions (closures) ~200 Closure conversion, runtime MinCaml
Recursive functions (let rec) ~50 Type inference (occurs check), codegen MinCaml
Tuples ~100 Parser, type inference, codegen MinCaml
Arrays ~100 Parser, runtime (bounds checking) MinCaml
Monomorphic type inference ~100 Unification MinCaml
Polymorphic type inference (HM) ~300 Generalization, instantiation Algorithm W, PLZoo
Algebraic data types ~200–400 Parser, type checker, runtime (tagging) HaMLet, Tao
Pattern matching (basic) ~200 Exhaustiveness check, case trees Tao, Ante
Pattern matching (optimized) ~400–600 Maranget's algorithm OCaml, Rust
Type classes ~500–2000 Dictionary passing, instance resolution MicroHs, Ben Lynn
Modules (basic) ~500–1000 Namespace management HaMLet
Modules (functors/signatures) ~2000–5000 Type-level computation HaMLet, 1ML
Row polymorphism ~300–800 Extended unification EYG, type-systems
Algebraic effects ~500–1500 Effect typing, runtime support Eff, Frank, Ante
Algebraic subtyping ~500 Polar types, biunification Simple-sub
Linear types ~600 Linearity checker Austral
Lazy evaluation ~300–500 Thunks, memoization runtime MicroHs, Ben Lynn
Garbage collection (Cheney) ~200 Runtime system Most
Tail call optimization ~50–100 Codegen (jump instead of call) MinCaml
Inline expansion ~100 Normalization pass MinCaml
Dead code elimination ~50 Free variable analysis MinCaml
Totality checking ~300–500 Coverage analysis, termination checker Tao, Dhall

Further reading:

  • Write You a Haskell (and sequel): builds a Haskell subset incrementally: lambda calculus → STLC → HM inference → ADTs → pattern matching → type classes → STG → LLVM.
  • Implementing Functional Languages: a tutorial by Simon Peyton Jones & David Lester: complete implementations of template instantiation, G-Machine, TIM, and parallel G-Machine for a lazy Core language. Reimplemented in C++ with LLVM by Daniel Fedorin.
  • The ZINC experiment: the foundational paper behind OCaml's bytecode compiler. The ZINC abstract machine uses ~140 instructions and 7 registers. Implementations include OMicroB (running OCaml bytecode on PIC18 microcontrollers with <10KB RAM) and HardCaml-Zinc (hardware implementation).
  • Elaboration Zoo: progressive dependent type checking implementations, each a single Haskell file of 200–800 lines, from basic NbE through holes, implicit arguments, and first-class polymorphism. The best resource for understanding modern elaboration. Its companion smalltt (~1–2K LOC Haskell) is a complete dependent type elaborator with normalization-by-evaluation.
  • Modern Compiler Implementation in ML: the Tiger language compiler covers every phase from lexing through graph-coloring register allocation in ~5,000–8,000 LOC of SML. Multiple GitHub implementations target x86-64 and RISC-V.

If you want a milliHaskell, all your inspiration/ingredients are right here.

Hirrolot's CoC

🤖 The most extreme capability-to-size ratio in this list — a complete Calculus of Constructions (the type theory at the top of the lambda cube) with bidirectional typing, dependent function types, and a type-in-type universe, all in a single OCaml gist of ~60–80 lines. It can express length-indexed vectors and other dependently typed programs. Not ML-family per se, but it demonstrates that full dependent types need not be complex to implement.

Harrop MiniML

🤖 MiniML demonstrates the absolute floor for a native-code ML compiler. Using Camlp4 for parsing and OCaml's LLVM bindings, it supports integer arithmetic, conditionals, and recursive first-order functions. Xavier Leroy noted the critical caveat: this is not truly "Mini-ML" since it lacks higher-order first-class functions — adding closures and garbage collection would significantly expand the codebase. Still, it shows what LLVM enables in ~100 lines.

Algorithm W

🤖 Algorithm W Step by Step by Martin Grabmüller (~300 LOC, literate Haskell) is the canonical educational implementation of Algorithm W for Hindley-Milner type inference. Self-contained, well-commented, and widely referenced — this is where most people first implement HM inference.

tomprimozic/type-systems

🤖 A collection of standalone implementations of several inference algorithms in OCaml (~300–600 LOC total): basic Algorithm W, row polymorphism (the technique foundational to Elm's original type system), and HMF (first-class polymorphism with partial inference). Each variant is self-contained in a single directory. Where Algorithm W Step by Step teaches you one algorithm well, this repository shows you what changes when you swap in more powerful type system features.

THIH

🤖 Typing Haskell in Haskell by Mark P. Jones is the definitive executable specification of Haskell 98's complete type system in just 429 lines of core Haskell. It covers kinds, qualified types, type classes, pattern matching types, binding groups, mutual recursion, and defaulting. For context, the Hugs type checker implementing the same semantics spans 90+ pages of C. THIH is a type checker only (no evaluation), but its density of specification per line of code is unmatched.

🤖 ~500 LOC of Scala. Lionel Parreaux's clean reimplementation of Stephen Dolan's MLsub — algebraic subtyping that adds union and intersection types to Hindley-Milner while preserving principal types. No annotations required. The original MLsub won POPL 2017; Simple-sub distills it into an ICFP 2020 Pearl that's small enough to read in one sitting. The ancestor of MLscript, which grows the idea into a full language with OOP and TypeScript interop.

PLZoo poly

🤖 ~400–600 LOC, OCaml. Implements a lazy, purely functional language with parametric polymorphism and HM type inference. Its sibling miniml (~300–500 LOC) includes a compiler targeting an abstract machine. Both are part of Andrej Bauer's Programming Languages Zoo, which contains 12+ miniature language implementations, each a few hundred lines of OCaml, covering everything from untyped lambda calculus to call-by-push-value.

EYG

🤖 ~500 LOC JavaScript interpreter, full implementation in Gleam. EYG ("Eat Your Greens") by Peter Saxton prioritizes predictability, portability, and crash-free programs. It uses row-typed inference (HM extended with row polymorphism), algebraic effects as the sole FFI mechanism, and closure serialization — functions can be sent to other machines for tierless client/server programming. The most distinctive feature: programs are stored as JSON ASTs, not text files. A structural editor makes it impossible to write syntactically invalid programs.

Pico-ml

🤖 An OCaml subset with HM type inference that compiles to WebAssembly, implemented in TypeScript. Small and self-contained — unusual for having a TypeScript host language rather than the OCaml/Haskell norm. A good starting point if you want to understand ML compilation targeting the browser.

TinyML

🤖 <700> Packs a lexer, parser, interpreter, and full polymorphic HM type checker into under 700 lines of SML. Referenced on Lambda the Ultimate, this may be the smallest complete implementation with genuine Hindley-Milner inference, though the original download link appears to have gone stale.

Eff

🤖 The original algebraic effects language (2012) by Andrej Bauer and Matija Pretnar. OCaml syntax with effect handlers as first-class constructs — you declare effect operations, then install handlers that give them meaning. This is where the idea was first made concrete in a running implementation. Koka, Frank, OCaml 5's effect handlers, and virtually every subsequent algebraic effects system trace lineage here.

Frank

🤖 "Do Be Do Be Do" (POPL 2017) by Sam Lindley, Conor McBride, and Craig McLaughlin. A strict effectful functional language where functions are handlers that handle zero effects — and multihandlers generalize function abstraction to handle multiple effect interfaces simultaneously. The insight: the boundary between "function" and "effect handler" is artificial. Implemented in Haskell. Lindley describes it as "the one I'm most fond of" while noting it's "basically unmaintained." That tension between conceptual elegance and practical neglect is the story of many languages on this list.

Grace

🤖 A JSON superset with bidirectional type checking and row polymorphism, by Gabriella Gonzalez (author of Dhall). Designed explicitly as a "ready-to-fork" language skeleton — if you need a typed DSL, clone Grace and customize it. Has open records, open unions (polymorphic variants), and a clean Haskell codebase that reads like a tutorial. No Hindley-Milner per se (bidirectional instead), but closely related.

Hackett

🤖 A Haskell-like language implemented entirely as Racket macros via the "Type Systems as Macros" technique, by Alexis King. Bidirectional type inference, algebraic datatypes, pattern matching, typeclasses, higher-kinded types, and higher-rank polymorphism — all implemented not as a separate type-checker pass but as macro expansion. The meta-angle is the story: types as macros rather than a traditional elaboration pipeline.

Scrapscript

🤖 A content-addressable pure functional language where every expression reduces to a cryptographic hash, stored in a decentralized "scrapyard" registry and referenced by hash or alias. The implementation is a ~1,300-line dependency-free Python interpreter in a single file, with a baseline compiler to C (~500 LOC) and an SSA IR with SCCP/DCE optimization (~1,000 LOC). Pattern matching is the sole control-flow mechanism. Compiles to C, WebAssembly, and Cosmopolitan portable executables. Implemented primarily by Max Bernstein.

MinCaml

🤖 ~2,000 LOC, OCaml → native code. The gold standard for capability-to-code-size ratio. Written by Eijiro Sumii at Tohoku University, it implements a strict, higher-order functional language with type inference, closures, tuples, arrays, tail-call optimization, inline expansion, constant folding, and graph-coloring register allocation. It compiles to SPARC, PowerPC, and x86 assembly. On benchmarks including a ray tracer, MinCaml-compiled code runs within 2× of GCC and OCaml's ocamlopt — sometimes faster. The deliberate trade-off: it omits polymorphism, algebraic data types, and pattern matching. Used in undergraduate compiler courses at the University of Tokyo since 2001, where students build ray tracers compiled by their own compilers running on custom CPUs.

  • Repo: github.com/esumii/min-caml
  • Paper: "MinCaml: A Simple and Efficient Compiler for a Minimal Functional Language" (FDPE 2005)
  • Forks: gocaml (Go + LLVM reimplementation), miniml (OCaml + LLVM, ~1,500 LOC, adds LLVM backend to MinCaml's architecture)

Ben Lynn

🤖 ~2,000 lines of Haskell + 350 lines of C. Arguably the most remarkable bootstrapping achievement in this space. Starting from a 350-SLOC C runtime that interprets combinatory logic, Lynn builds a chain of approximately 20 progressively more capable compilers, each written in the subset of Haskell that the previous compiler can handle. The final compiler supports type inference, type classes, algebraic data types, pattern matching, guards, where clauses, monadic I/O, modules, and layout parsing — approaching Haskell 98 coverage. It compiles Haskell to combinatory logic via Kiselyov's bracket abstraction algorithm, with graph reduction evaluation. Later stages even target WebAssembly. The entire bootstrapping chain is reproducible from just a C compiler.

1ML

🤖 ~3,000–5,000 LOC, OCaml. Andreas Rossberg unified ML's core and module layers into a single language where modules are first-class values, types are values, and functors are ordinary functions. It elaborates to System Fω with HM-style inference. Won the ICFP Most Influential Paper Award in 2025. A proof-of-concept interpreter, not optimized, but a conceptual breakthrough in minimal surface area.

mlml

🤖 A self-hosting OCaml subset compiler targeting native x86-64. ~3,000–5,000 LOC. Supports pattern matching, algebraic data types, recursive functions, and closures. Does not implement type inference — it demonstrates the minimum OCaml subset needed for self-compilation.

Dhall

🤖 A total (non-Turing-complete) typed configuration language. ~4K LOC core Haskell. Normalization is guaranteed to terminate — you can always reduce a Dhall expression to a normal form, which means imports resolve, functions inline, and what you get is plain data. Based on a Calculus-of-Constructions-derived type theory with records, unions, and natural numbers. Has a formal specification and implementations in Haskell, Rust, Go, and Clojure.

Ante

🤖 Combines HM type inference, algebraic data types, pattern matching, algebraic effects, and an ownership-like system for shared mutability. Written in Rust, it uses Cranelift for native code generation. Actively developed, aiming to bridge the Rust/OCaml divide.

Tao

🤖 Surprisingly feature-rich for its size: generics, typeclasses, sum types, pattern matching, first-class functions, currying, algebraic effects, associated types, and totality checking. Its pipeline runs from lexing through HIR type inference to MIR monomorphization and bytecode execution. Written in Rust.

Austral

🤖 A systems language with linear types and capability-based security. The linear type checker is ~600 lines. OCaml bootstrap compiler targeting C. Designed by Fernando Borretti to fit in one person's head — the spec is deliberately small enough that a single developer can understand the entire language. Not functional in the Haskell sense, but linear types make it adjacent. An experiment in "what if we took linear types seriously but kept the language small."

AQaml

🤖 A self-hosting OCaml subset compiler targeting native x86-64. ~5,000–8,000 LOC. Adds records, variants, references, and garbage collection beyond what mlml supports. Triple self-hosting verified. Like mlml, it omits type inference — demonstrating the minimum OCaml needed for self-compilation.

Borgo

🤖 Adds ML-family features (algebraic data types, exhaustive pattern matching, Result/Option types) to Go's ecosystem by compiling to Go source code with Rust-like syntax. Written in Rust.

HaMLet

🤖 ~10,000–15,000 LOC, SML. Andreas Rossberg's most faithful implementation of the Definition of Standard ML. It implements all of SML '97 including the full module system (signatures, structures, functors), mapping rule-by-rule to the formal Definition. Jeremy Yallop recommends it as the most readable SML implementation. It can be bundled into a single SML file and compiled by any SML implementation. A compile-js branch demonstrates compilation to JavaScript.

SOSML

🤖 ~10,000–15,000 LOC, TypeScript. Implements the full SML core language in the browser: val/fun/datatype declarations, pattern matching, HM type inference, exceptions, and references. Used for teaching at Saarland University.

MicroHs

🤖 By Lennart Augustsson (one of GHC's original creators) — the most complete "small" Haskell compiler alive today. It compiles an extended subset of Haskell 2010 including type classes, do-notation, deriving, record syntax, overloaded literals, and modules. It is fully self-hosting and — critically — bootstrappable from only a C compiler (no pre-existing Haskell toolchain required). MicroHs translates Haskell to combinators executed by a C runtime. It has a JavaScript runtime target, a package manager (mcabal), and can compile real Hackage packages like QuickCheck. The codebase is not trivially small (estimated 15,000–30,000 lines across compiler, libraries, and runtime), but for what it does — a near-complete Haskell compiler bootstrappable from C — it is remarkably compact.


Read the original article

Comments

  • By taolson 2026-02-2022:591 reply

    Don't know if my language is considered Lil' enough for this, but it's a pure, lazy functional language based upon Miranda (progenitor language to Haskell) that compiles to x86-64 asm. ~6700 SLOC for the (self-hosted!) compiler, and ~3300 SLOC additional for the extensive library of functional data structures and functions.

    https://github.com/taolson/Admiran

    • By spragl 2026-02-218:341 reply

      I really loved Miranda back when I learned about it. I still have the book. I think it never took off because it was quite expensive for universities to use. Im sure David Turner regrets his price model today. Now he has made Miranda available here https://www.cs.kent.ac.uk/people/staff/dat/miranda/

      • By taolson 2026-02-2112:44

        Yes, the open-source release he did is what introduced me to Miranda. I rewrote a lot of my previous Haskell solutions to Advent of Code puzzles with it, and liked it so much I decided to try to improve on it ;-)

        That's what led to Admiran. I originally wrote Admiran in Miranda, then bootstrapped from that to self-hosting when it was stable enough to do so. The original Miranda combinator compiler / interpreter took 20 minutes to compile all of Admiran, while the self-hosted version now takes 20 seconds.

        One of the grad students of David Turner has taken up maintenance on the original Miranda source; the repository is now at https://codeberg.org/DATurner/miranda

  • By dunham 2026-02-2019:451 reply

    My little language Newt is 7 kloc. Dunno if it's worth including, it's mostly an exercise to learn how these things work and is not as polished as I'd like.

    - Self-hosted

    - Compiles to javascript

    - Bidirectional typechecking with NbE (based on elaboration zoo)

    - Dependent type checking

    - type classes

    - ADTs with dependent pattern matching

    - TCO (trampoline for mutually tail recursive functions)

    - Erasure of compile-time only values (0, ω quantities, but not linear)

    - Web playground

    - LSP (added this month)

    - Syntax is similar to Agda / Idris / Haskell

    https://github.com/dunhamsteve/newt

    • By taolson 2026-02-2022:501 reply

      Either newt was already in the list, or it got added. We talked a bit about using our languages for AoC 2024 -- looks like you've been keeping busy working on it!

      • By dunham 2026-02-2117:01

        Yeah it has been fun. Lots of directions I can take it:

        Since I have an LSP, I've got faster turn around and can add editor functionality that requires poking at the compile state. That's my current thread.

        I have a C backend on hold, while I think about how I want to represent data without boxing everything and about whether I want to do reference counting or GC. (Reference counting unlocks "counting immutable beans" if I decide to give that a go, but I'd also like to try implementing GC someday.)

        I should do some browser interop stuff and write something other than a compiler in my language.

        And there are language enhancements: implementing "Do unchained" from Lean, automatic handling of lazy and/or async modalities, deriving implementations of classes, ...

  • By kristianp 2026-02-2122:29

    Reminds me of Ghuloum's "An Incremental Approach to Compiler Construction" [1]. The tutorial [2], takes you through making a scheme compiler. It starts by compiling an integer to x86 assembly, using the output of gcc's assembly as reference. Then it goes through to tail calls and heap allocation, with a working compiler at every step. Tests at [3]. Mentioned in [4,5]. Gholoum went on to write Ikarus scheme which continued that work [6], but it never moved to 64 bit and the last release was in 2008.

    [1] http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

    [2] https://web.archive.org/web/20061217101846/http://www.cs.ind...

    [3] https://web.archive.org/web/20090618030856/http://www.cs.ind...

    [4] http://lambda-the-ultimate.org/node/1752

    [5] https://news.ycombinator.com/item?id=21627615

    [6] https://web.archive.org/web/20090928074841/http://ikarus-sch...

HackerNews