...

prats226

224

Karma

2015-03-06

Created

Recent Activity

  • An issue would be as soon as you make questions public, even by letting hosted LLMs predict on them, they are tainted. You can't use them anymore. So would it be a one time test dataset?

  • For LLMs, programming languages are basically additional languages that we speak. So how it handles low-resource programming languages is same as how it handles speaking languages with less contribution in training data?

    DSL's would be even harder for LLM's to get right in that case compared to the low-resource language itself

  • I think the choice mainly stems from how you want to use the output. If the output is going to get fed to another LLM, then you want to select markup language where 1) the grammer would not cause too many issues with tokenization 2) which LLM has seen a lot in past 3) generates minimal number of tokens. I think markdown fits it much better compared to other markup languages.

    If goal is to parse this output programmatically, then I agree a more structured markup language is better choice.

  • Given that input is image and not raw pdf, its not completely unexpected

  • I think author has taken a very long term view of what's happened so far without getting too politically specific.

    I found it very informative to get a sense of how many abstractions are there in the system. So I atleast know where to deep dive on.

HackerNews