Hacker News

A Legend of Zelda-inspired N64 homebrew ROM featuring Sophia Elya — an AI NPC powered by a nano-GPT transformer running live inference on the MIPS R4300i CPU. No precomputed responses. No lookup tables. Real matrix multiply, real softmax, real attention — on a 93.75 MHz CPU from 1996.

Video demos:

Download ROM: legend_of_elya.z64 — ready to run in ares emulator or EverDrive 64

Press A near Sophia Elya to trigger AI dialog
The N64 CPU runs a full 4-layer transformer: embedding → attention → FFN → logits → sampling
Output tokens appear character-by-character with a live tok/s counter
Each response is different — seeded by CPU oscillator jitter (hardware entropy)
32 prompts covering identity, Zelda lore, RustChain, hardware trivia
Runs in the ares emulator and on real N64 hardware via EverDrive 64

Parameter	Value
Parameters	819,200 (819K)
Layers	4
Embedding dim	128
Attention heads	4 (32-dim each)
Vocabulary	256 (byte-level ASCII)
Context window	64 tokens
Quantization	Q8 (int8 weights + float16 block scales, 32-weight blocks)
Weight file	458 KB on cartridge ROM
Inference math	Float32 on MIPS R4300i FPU
Speed	~60 tok/s in emulator, ~1-3 tok/s on real hardware
KV cache	256 KB in RDRAM
Total RDRAM	~263 KB (KV cache + 7KB scratch)

Float32 inference — all activations, attention scores, and accumulations are IEEE 754 float32
On-the-fly Q8 dequantization — weights stay compressed as int8 in ROM; dequantized per matmul
Custom Taylor exp() — range-reduction exp(x) = exp(x/128)^128 with degree-4 Taylor series and 7 squarings. Uses zero float-to-int casts to avoid the R4300i's missing trunc.w.s instruction
Quake III fast inverse sqrt — 0x5f3759df bit trick with 2 Newton-Raphson iterations for RMS normalization
Big-endian aware — weight file is little-endian (Python export), N64 is big-endian. swap16/swap32 helpers handle byte-order conversion for header fields and float16 scales
Hardware entropy — MIPS CP0 Count register XOR'd with frame counter for RNG seeding
Greedy sampling — pure argmax over printable ASCII (32-126), matching proven x86 reference quality
Embedding scale restoration — Q8 export normalizes to [-1,1]; the original scale factor (em=3.5) is stored in header byte and restored at init

File	Purpose
`nano_gpt.c`	Float32 GPT inference engine (MIPS R4300i)
`nano_gpt.h`	Model struct definitions, KV cache, API
`legend_of_elya.c`	Game: dungeon scene, sprites, dialog, music, HUD
`train_sophia_v5.py`	PyTorch training + Q8 weight export
`weights/sophia_weights.bin`	Pre-trained v5 weights (458KB, ready to use)
`Makefile`	libdragon build system
`src/`	Latest source snapshots
`screenshots/`	Working N64 LLM screenshots

Download legend_of_elya.z64 from Releases and load in ares emulator or copy to EverDrive SD card.

Requires libdragon toolchain:

# Set toolchain path
export N64_INST=/path/to/mips64-toolchain # Place weights in filesystem/
cp weights/sophia_weights.bin filesystem/ # Build
make clean && make # Run in ares
ares legend_of_elya.z64

# Requires PyTorch + CUDA GPU
python3 train_sophia_v5.py
# ~20 min on RTX 5070, exports filesystem/sophia_weights.bin

The weights/sophia_weights.bin file contains a pre-trained v5 model (819K params, Q8 format, 458KB).

Training corpus covers: Sophia Elya identity, RustChain blockchain, Zelda lore, N64 hardware, PowerPC architecture, dungeon/RPG dialog.

Weight file format:

Offset	Size	Field
0	4	Magic: `0x53454149` ("SEAI"), little-endian
4	1	n_layers (4)
5	2	n_embed (128)
7	1	n_heads (4)
8	2	vocab_size (256)
10	1	ctx_len (64)
11	1	em_scale_x16 (56 = 3.5 × 16)
12	32768	Embedding table (256 × 128, int8)
32780	...	Layer weights (int8) + scales (float16) × 4 layers

819K parameters. Responses are short and sometimes imprecise ("rinces" instead of "Princess"). Expected at this scale. The achievement is real-time transformer inference on 1996 hardware.
Context window is 64 tokens. Prompt + response must fit in 64 bytes.
No memory between dialogs. KV cache resets each conversation.
Byte-level vocabulary. One ASCII character per token — no subword tokenization.
Training corpus is small. More data and epochs will improve coherence.

The goal is to shrink, optimize, and package this into a reusable SDK that any N64 homebrew developer can drop into their game to give NPCs real language understanding.

Config	Layers	Embed	Params	Weight Size	RAM (KV+scratch)	Use Case
Tiny	2	64	~100K	~60KB	~70KB	Simple responses, many NPCs
Small	4	128	819K	458KB	263KB	Current — single NPC dialog
Medium	6	192	~2.8M	~1.5MB	600KB	Rich dialog, Expansion Pak
Large	8	256	~8.4M	~4.2MB	1.6MB	Full conversations, 8MB mode

Every "AI NPC" in modern games is a cloud API call. This runs entirely on the cartridge — no internet, no server, no loading screen. The VR4300 does the matrix math. The ROM holds the weights. The RDRAM holds the KV cache.

It's the same transformer architecture as GPT — just 819K parameters instead of 175 billion. And it runs on hardware that predates Google.

If we can make a transformer talk on 8MB of RAM and a 93MHz MIPS CPU, the excuses for cloud-dependent "AI" in games evaporate.

IBM POWER8 Response	Zelda Triforce Response

Built by Elyan Labs.

Engine: nano-GPT float32 inference on MIPS R4300i
Game: libdragon SDK, pixel art, LOZ-inspired dungeon
Training: PyTorch on RTX 5070
Platform: BoTTube for video hosting

Source is open — build it, train it, improve it, port it.

Hacker News

Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

Show article

AutoJanitor

Comments

By acmiyaguchi 2026-02-220:453 reply

By AutoJanitor 2026-02-222:39

By gbnwl 2026-02-220:551 reply

By AutoJanitor 2026-02-222:24

By Jach 2026-02-220:521 reply

By AutoJanitor 2026-02-222:241 reply

By Jach 2026-02-228:311 reply

By AutoJanitor 2026-02-2221:22

By AutoJanitor 2026-02-2216:22

By mlaux 2026-02-2123:351 reply

By AutoJanitor 2026-02-222:43

HackerNews

Name	Name