Comments

  • By tbruckner 2025-12-0816:45

    A simple cue like asking the model to 'see' or 'hear' can push a purely text-trained language model towards the representations of purely image-trained or purely-audio trained encoders.

HackerNews