Cool! I’ve been working on adding the same thing for Apple Silicon within my general “make autoresearch a serious tool” project here: https://github.com/Entrpi/autoresearch-everywhere
This is very much in line with what I found fascinating about optimizing microgpt for speed (0). Or rather, what I was able to do with it after doing so. It's so small and so fast to train, you can really dig deep into the optimization landscape. I've spent all my free time this past week digging into it.
0: https://entrpi.github.io/eemicrogpt/ (The writeup is from a few days ago, and I'm still running experiments before I do a big rewrite. Slowrun is good food for thought.)
Topical. My hobby project this week (0) has been hyper-optimizing microgpt for M5's CPU cores (and comparing to MLX performance). Wonder if anything changes under the regime I've been chasing with these new chips.