...

arugulum

615

Karma

2019-05-09

Created

Recent Activity

  • LoRA? The parameter-efficient fine-tuning method published 2 years before Llama and already actively used by researchers?

    RoPE? The position encoding method published 2 years before Llama and already in models such as GPT-J-6B?

    DPO, a method whose paper had no experiments with Llama?

    QLoRA? The third in a series of quantization works by Tim Dettmers, the first two of which pre-dated Llama?

  • If your starting position is already that Sam Altman lies about everything that doesn't fit your preconceived positions, that doesn't seem like a very useful meaningful position to update.

  • > Surely if OpenAI had insisted upon the same things that Anthropic had, the government would not have signed this agreement.

    But they did.

    "Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement."

  • >that they need to rig their elections against themselves to get dissenting voices

    I don't believe this is true. If you're talking about Non-Constituency Members of Parliament, they are consolation prizes given to best losers, and there are many things they cannot vote on. Moreover, the ruling party almost never lifts the party whip, i.e. members of the party CANNOT vote against the party line (without being kicked out of the party, which results in them being kicked out of parliament). In other words, since the ruling party already has a majority, any opposing votes literally do not matter.

    If you aren't talking about the NCMP scheme, then I do not know what you're talking about, as the ruling party does institute policies that are beneficial for the incumbent party.

  • My statement was

    >a (fine-tuned) base Transformer model just trivially blowing everything else out of the water

    "Attention is All You Need" was a Transformer model trained specifically for translation, blowing all other translation models out of the water. It was not fine-tuned for tasks other than what the model was trained from scratch for.

    GPT-1/BERT were significant because they showed that you can pretrain one base model and use it for "everything".

HackerNews