Just to be clear by alternatives I mean currently used oral prep pills: truvada (which is available generic) and descovy, both from gilead. Both are 1 pill a day, with truvada available in a regime where you basically take it only for couple of days before and after high risk activity. And most importantly it can be cheap since it’s generic.
This thing is cool but it would be even better if it didn’t cost a fortune.
It didn’t happen earlier because before gilded age the US (among whites..) was actually quite good with equality, and then every time we were getting close the opposite force was taking over: once in the beginning of 20s century with worker rights / antitrust and once in 1930s with FDRs New Deal. Interestingly both times things were getting quite good afterwards for the people.
Not sure it’s gonna happen time though.
> Also note, if the sequence length is not really much larger than the model dimension (at least two orders of magnitude more), the quadratic complexity of the self-attention is really not such a big issue - the matrix multiplication in the feed-forward layers will be usually 8x the model dimension squared, and thus that part will usually dominate.
This is incorrect in case of batched inference. There are two bottlenecks at play: compute and memory, and your reasoning applies to compute. In case of memory it gets trickier: for MLP layers you’ll need to read same set of weights for all elements of your batch, while for kv cache for attention elements will be different. That’s why in practice the real length where attention dominates would be closer to model dimension / batch size, rather than just model dimension. And this number isn’t as high anymore.