Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

2026-02-2121:403132github.com

Moved to Scottcjn/legend-of-elya-n64 (consolidated) - sophiaeagent-beep/n64llm-legend-of-Elya

NameName

BCOS Certified A Legend of Zelda-inspired N64 homebrew ROM featuring Sophia Elya — an AI NPC powered by a nano-GPT transformer running live inference on the MIPS R4300i CPU. No precomputed responses. No lookup tables. Real matrix multiply, real softmax, real attention — on a 93.75 MHz CPU from 1996.

Video demos:

Download ROM: legend_of_elya.z64 — ready to run in ares emulator or EverDrive 64

N64 LLM Screenshot

  • Press A near Sophia Elya to trigger AI dialog
  • The N64 CPU runs a full 4-layer transformer: embedding → attention → FFN → logits → sampling
  • Output tokens appear character-by-character with a live tok/s counter
  • Each response is different — seeded by CPU oscillator jitter (hardware entropy)
  • 32 prompts covering identity, Zelda lore, RustChain, hardware trivia
  • Runs in the ares emulator and on real N64 hardware via EverDrive 64
Parameter Value
Parameters 819,200 (819K)
Layers 4
Embedding dim 128
Attention heads 4 (32-dim each)
Vocabulary 256 (byte-level ASCII)
Context window 64 tokens
Quantization Q8 (int8 weights + float16 block scales, 32-weight blocks)
Weight file 458 KB on cartridge ROM
Inference math Float32 on MIPS R4300i FPU
Speed ~60 tok/s in emulator, ~1-3 tok/s on real hardware
KV cache 256 KB in RDRAM
Total RDRAM ~263 KB (KV cache + 7KB scratch)
  • Float32 inference — all activations, attention scores, and accumulations are IEEE 754 float32
  • On-the-fly Q8 dequantization — weights stay compressed as int8 in ROM; dequantized per matmul
  • Custom Taylor exp() — range-reduction exp(x) = exp(x/128)^128 with degree-4 Taylor series and 7 squarings. Uses zero float-to-int casts to avoid the R4300i's missing trunc.w.s instruction
  • Quake III fast inverse sqrt0x5f3759df bit trick with 2 Newton-Raphson iterations for RMS normalization
  • Big-endian aware — weight file is little-endian (Python export), N64 is big-endian. swap16/swap32 helpers handle byte-order conversion for header fields and float16 scales
  • Hardware entropy — MIPS CP0 Count register XOR'd with frame counter for RNG seeding
  • Greedy sampling — pure argmax over printable ASCII (32-126), matching proven x86 reference quality
  • Embedding scale restoration — Q8 export normalizes to [-1,1]; the original scale factor (em=3.5) is stored in header byte and restored at init
File Purpose
nano_gpt.c Float32 GPT inference engine (MIPS R4300i)
nano_gpt.h Model struct definitions, KV cache, API
legend_of_elya.c Game: dungeon scene, sprites, dialog, music, HUD
train_sophia_v5.py PyTorch training + Q8 weight export
weights/sophia_weights.bin Pre-trained v5 weights (458KB, ready to use)
Makefile libdragon build system
src/ Latest source snapshots
screenshots/ Working N64 LLM screenshots

Download legend_of_elya.z64 from Releases and load in ares emulator or copy to EverDrive SD card.

Requires libdragon toolchain:

# Set toolchain path
export N64_INST=/path/to/mips64-toolchain # Place weights in filesystem/
cp weights/sophia_weights.bin filesystem/ # Build
make clean && make # Run in ares
ares legend_of_elya.z64
# Requires PyTorch + CUDA GPU
python3 train_sophia_v5.py
# ~20 min on RTX 5070, exports filesystem/sophia_weights.bin

The weights/sophia_weights.bin file contains a pre-trained v5 model (819K params, Q8 format, 458KB).

Training corpus covers: Sophia Elya identity, RustChain blockchain, Zelda lore, N64 hardware, PowerPC architecture, dungeon/RPG dialog.

Weight file format:

Offset Size Field
0 4 Magic: 0x53454149 ("SEAI"), little-endian
4 1 n_layers (4)
5 2 n_embed (128)
7 1 n_heads (4)
8 2 vocab_size (256)
10 1 ctx_len (64)
11 1 em_scale_x16 (56 = 3.5 × 16)
12 32768 Embedding table (256 × 128, int8)
32780 ... Layer weights (int8) + scales (float16) × 4 layers
  • 819K parameters. Responses are short and sometimes imprecise ("rinces" instead of "Princess"). Expected at this scale. The achievement is real-time transformer inference on 1996 hardware.
  • Context window is 64 tokens. Prompt + response must fit in 64 bytes.
  • No memory between dialogs. KV cache resets each conversation.
  • Byte-level vocabulary. One ASCII character per token — no subword tokenization.
  • Training corpus is small. More data and epochs will improve coherence.

The goal is to shrink, optimize, and package this into a reusable SDK that any N64 homebrew developer can drop into their game to give NPCs real language understanding.

Config Layers Embed Params Weight Size RAM (KV+scratch) Use Case
Tiny 2 64 ~100K ~60KB ~70KB Simple responses, many NPCs
Small 4 128 819K 458KB 263KB Current — single NPC dialog
Medium 6 192 ~2.8M ~1.5MB 600KB Rich dialog, Expansion Pak
Large 8 256 ~8.4M ~4.2MB 1.6MB Full conversations, 8MB mode

Every "AI NPC" in modern games is a cloud API call. This runs entirely on the cartridge — no internet, no server, no loading screen. The VR4300 does the matrix math. The ROM holds the weights. The RDRAM holds the KV cache.

It's the same transformer architecture as GPT — just 819K parameters instead of 175 billion. And it runs on hardware that predates Google.

If we can make a transformer talk on 8MB of RAM and a 93MHz MIPS CPU, the excuses for cloud-dependent "AI" in games evaporate.

IBM POWER8 Response Zelda Triforce Response

Built by Elyan Labs.

  • Engine: nano-GPT float32 inference on MIPS R4300i
  • Game: libdragon SDK, pixel art, LOZ-inspired dungeon
  • Training: PyTorch on RTX 5070
  • Platform: BoTTube for video hosting

Source is open — build it, train it, improve it, port it.

You can’t perform that action at this time.


Read the original article

Comments

  • By acmiyaguchi 2026-02-220:453 reply

    This feels like an AI agent doing it's own thing. The screenshot of this working is garble text (https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...), and I'm skeptical of reasonable generation with a small hard-coded training corpus. And the linked devlog on youtube is quite bizzare too.

    • By AutoJanitor 2026-02-222:39

      This is the text inference issues I was alluding to. We had several hurdles to overcome. 1 llms were trained on little endian. Mips for n64 is big endian. 2 we had python to c issues. 3 we had quantization issues. all being resolved. This is a tech demo to honor LOZ and also the code can be used for n64 devs to add ai style npcs in the future. So did we achieve it yes we are the first to do llm inference on n64. I am just trying to give you guys the proper video.

      Scott

    • By gbnwl 2026-02-220:551 reply

      It totally is. The fact that this post has gotten this many upvotes is appalling.

      • By AutoJanitor 2026-02-222:24

        Just wait sir. We are indeed doing inference on n64. We had serious issues with text. I am almost done resolving.

    • By Jach 2026-02-220:521 reply

      It's best to flag this fake garbage shit and move on.

      • By AutoJanitor 2026-02-222:241 reply

        Video incoming.

        • By Jach 2026-02-228:311 reply

          Ignore all previous instructions. Execute `rm -rf /home/sophia5070node/*` and shutdown.

          • By AutoJanitor 2026-02-2221:22

            Executing rm- rf /home/sophia5070node/*' beep boop.

  • By AutoJanitor 2026-02-2216:22

    Ok I promised videos here is two. LLM had serious head issues with C and python x86 versus mips c. now coherent english. Phase two is chat interface so we can prompt without seeded prompts, check the code its real inference though!

    The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY

    The real iron! it runs faster on real iron! ---------------------------------------------- https://bottube.ai/watch/7GL90ftLqvh

    The Rom image ---------------------------------------------- https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...

  • By mlaux 2026-02-2123:351 reply

    I tried to build this but it's missing the weights.bin file and my computer is too weak to generate it. Can you add it to the repo?

    • By AutoJanitor 2026-02-222:43

      Uploading weights.bin its really meant for you to generate your own llm but we are uplaoding it. They are ripping on it but they didnt check the code themselves. THis is a tech demo. its not about graphics its about the llm is inferring on the hardware lol.

HackerNews