I wondered the same, but the rendering seems right, the output was almost instant. I'll recheck the token counter; anyway as you say, fast isn't practical. Actually I had to develop my own tiny model https://huggingface.co/xaskasdf/brandon-tiny-10m-instruct to fit something "usable", and it's basically a liar or disinformation machine haha
Actually is purely bandwidth-bound, the major bottleneck of the whole process, for me in this case, is the B450 mobo I got that's only capable of pcie3 and 1x8 in the pcie lanes for gpu instead of 1x16; so I'm capped until I get an X570 maybe. I should get around twice or triple the tok speed with that upgrade alone