...

declaredapple

287

Karma

2023-12-22

Created

Recent Activity

  • Some people may be willing to travel but don't know where the best place to go is.

  • It's going to be GPT-3.5turbo not GPT4. As others have mentioned, binggpt/bard are also free.

    These smaller models are a relatively cheap to run, especially in high batches.

    I'm sure there will also be aggressive rate limits.

  • They've been designing their own chips a while now, including with an NPU.

    Also because of their unified memory design, they actually have insane bandwidth which is incredibly useful for LLMs. IMO they may have a head-start in that respect for on-device inference of large models (e.g. 1B+ params).

  • The flash doesn't do the computations though, that's just a method of getting it to the processor

HackerNews