The temperature parameters largely went away when we moved towards reasoning models, which output lots of reasoning tokens before you get to the actual output tokens. I don’t know if it was found that reasoning works better with a higher temperature, or that having separate temperatures for reasoning vs. output wasn’t practical, but that’s my observation of the timing, anyway. And to the other commenter’s point, even a temperature of 0 is not deterministic if the batches are not invariant, which they’re not in production workloads.
If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
100%. This is what I posted about on Hacker News ([1] where it got no traction) and Reddit [2] (where it led to a discussion but then got deleted by a mod).
[1] https://news.ycombinator.com/item?id=46705588
[2] https://www.reddit.com/r/ExperiencedDevs/comments/1qj03gq/wh...