Have you tried pocket 386? https://arstechnica.com/gadgets/2024/08/a-few-weeks-with-the...
I'm very interesting in this field (realtime audio + GPU programming). How do you deal with the latency? Do you send or multiple single vectors/buffers to GPU?
Also I think because samples in one channel need to be processed sequentially, does that mean mono audio processing won't benefit a lot from GPU programming. Or maybe you are dealing with spectral signal processing?