...

hashta

71

Karma

2023-03-26

Created

Recent Activity

  • It’s literally called "SimpleFold". But that’s not really my point, from your earlier comment (".. go through all the complexities first to find the generalized and simpler formulations"), I got the impression you thought the simplicity came purely from architectural insights. My point was just that to compare apples to apples, a model claiming "simpler but just as good" should ideally train on the same kind of data as AF or at least acknowledge very clearly that substantial amount of its training data comes from AF.

    I’m not trying to knock the work, I think it’s genuinely cool and a great engineering result. I just wanted to flag that nuance for readers who might not have the time or background to spot it, and I get that part of the "simple/simpler" messaging is also about attracting attention which clearly worked!

  • To people outside the field, the title/abstract can make it sound like folding is just inherently simple now, but this model wouldn’t exist without the large synthetic dataset produced by the more complex AF. The "simple" architecture is still using the complex model indirectly through distillation. We didn’t really extract new tricks to design a simpler model from scratch, we shifted the complexity from the model space into the data space (think GPT-5 => GPT-5-mini, there’s no GPT-5-mini without GPT-5)

  • I’m not sure AF3’s performance would hold up if it hadn’t been trained on data from AF2 which itself bakes in a lot of inductive bias like equivariance

  • One caveat that’s easy to miss: the "simple" model here didn’t just learn folding from raw experimental structures. Most of its training data comes from AlphaFold-style predictions. Millions of protein structures that were themselves generated by big MSA-based and highly engineered models.

    It’s not like we can throw away all the inductive biases and MSA machinery, someone upstream still had to build and run those models to create the training corpus.

  • An effective way that usually increases accuracy is to use an ensemble of capable models that are trained independently (e.g., gemini, gpt-4o, qwen). If >x% of them have the same output, accept it, otherwise reject and manually review

HackerNews