Thanks — appreciate it!
Good question. In theory, you can compile anything Turing-complete to anything else — ARM and Python are both Turing-complete. But practically, Python's model (dynamic typing, deep use of the stack) doesn't map cleanly onto ARM's register-based, statically-typed instruction set. PySM is designed to match Python’s structure much more naturally — it keeps the system efficient, simpler to pipeline, and avoids needing lots of extra translation layers.
I'm a software engineer by background, mostly in high-frequency trading (HFT), HPC, systems programming, and networking — so a lot of focus on efficiency and low-level behavior. I had played a bit with FPGAs before, but nothing close to this scale — most of the hardware and Python internals work I had to figure out along the way.
Thanks so much — really appreciate it!
Right now, the plan is to present it at PyCon first (next month) and then publish more about the internals afterward. Long-term, I'm keeping an open mind, not sure yet.
My background is in high-frequency trading (HFT), high-performance computing (HPC), systems programming, and networking. I didn't come from HW background — or at least, I wasn't when I started — but coming from the software side gave me a different perspective on how dynamic languages could be made much more efficient at the hardware level.
Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...
Easy - Not sure if categorize as easy, but more surprising: The current implementation is rather simple and elegant (at least I think so :-) ), so still no special advanced CPU design stuff (branch prediction, super-scalar, etc). So even now, I'm getting a huge improvement over CPython or MicroPython VMs in the known python bottlenecks (branchings, function calls, etc)
Not yet — I'm currently testing on a Zynq-7000 platform (embedded-class FPGA), mainly because it has an ARM CPU tightly integrated (and it's rather cheap). I use the ARM side to handle IO and orchestration, which let me focus the FPGA fabric purely on the Python execution core, without having to build all the peripherals from scratch at this stage.
To run PyXL on a server-class FPGA (like Azure instances), some adaptations would be needed — the system would need to repurpose the host CPU to act as the orchestrator, handling memory, IO, etc.
The question is: what's the actual use case of running on a server? Besides testing max frequency -- for which I could just run Vivado on a different target (would need license for it though)
For now, I'm focusing on validating the core architecture, not just chasing raw clock speeds.