...

textlapse

312

Karma

2024-06-26

Created

Recent Activity

  • If someone could please give an octopus a waterproof keyboard, perhaps we could have a kernel, a compiler and a new internet protocol all in one.

  • Commented: "Gemini 3.1 Pro"

    To really confuse it, ask it to take that tricycle with the platypus on it to a car wash.

  • It's definitely possible, I am not arguing against that.

    I am just saying it's not as flexible/cost-free as you would on a 'normal' von Neumann-style CPU.

    I would love to see Rust-based code that obviates the need to write CUDA kernels (including compiling to different architectures). It feels icky to use/introduce things like async/await in the context of a GPU programming model which is very different from a traditional Rust programming model.

    You still have to worry about different architectures and the streaming nature at the end of the day.

    I am very interested in this topic, so I am curious to learn how the latest GPUs help manage this divergence problem.

  • My understanding of warp (https://docs.nvidia.com/cuda/cuda-programming-guide/01-intro...) is that you are essentially paying the cost of taking both the branches.

    I understand with newer GPUs, you have clever partitioning / pipelining in such a way block A takes branch A vs block B that takes branch B with sync/barrier essentially relying on some smart 'oracle' to schedule these in a way that still fits in the SIMT model.

    It still doesn't feel Turing complete to me. Is there an nvidia doc you can refer me to?

HackerNews