...

potatolicious

31367

Karma

2008-09-24

Created

Recent Activity

  • You really want to break a task like this down to constituent parts - especially because in this case the "end to end" way of doing it (i.e., raw audio to summary) doesn't actually get you anything.

    IMO the right way to do this is to feed the audio into a transcription model, specifically one that supports diarization (separation of multiple speakers). This will give you a high quality raw transcript that is pretty much exactly what was actually said.

    It would be rough in places (i.e., Speaker 1, Speaker 2, etc. rather than actual speaker names)

    Then you want to post-process with a LLM to re-annotate the transcript and clean it up (e.g., replace "Speaker 1" with "Mayor Bob"), and query against it.

    I see another post here complaining that direct-to-LLM beats a transcription model like Whisper - I would challenge that. Any modern ASR model will do a very, very good job with 95%+ accuracy.

  • Commented: "Steam Frame"

    Your 2K monitor occupies something like a 20-degree field of view from a normal sitting position/distance. The 2K resolution in a VR headset covers the entire field of view.

    So effectively your 1080p monitor has ~6x the pixel density of the VR headset.

  • Commented: "Steam Frame"

    Yes but that can create major motion sickness issues - motion that does not correspond top the user's actual physical movements create a dissonance that is expressed as motion sickness for a large portion of the population.

    This is the main reason many VR games don't let you just walk around and opt for teleportation-based movement systems - your avatar moving while your body doesn't can be quite physically uncomfortable.

    There are ways of minimizing this - for example some VR games give you "tunnel vision" by blacking out peripheral vision while the movement is happening. But overall there's a lot of ergo considerations here and no perfect solution. The equivalent for a virtual desktop might be to limit the size of the window while the user is zooming/panning.

  • Commented: "Steam Frame"

    Oh yeah for sure. Most people seem to accept that 35ppd is "good enough" but not actually at-par with a high quality high-dpi monitor.

    I agree with you - I would personally consider 35ppd to be the floor for usability for this purpose. It's good in a pinch (need a nice workstation setup in a hotel room?) but I would not currently consider any extant hardware as full-time replacements for a good monitor.

  • Commented: "Steam Frame"

    There's no precise criteria but the usual measure is ppd (pixels per degree) and it needs to be high enough such that detailed content (such as text) displayed at a reasonable size is clearly legible without eye strain.

    > "Could you not just move your face closer to the virtual screen to see finer details?"

    Sure, but then you have the problem of, say, using an IMAX screen as your computer monitor. The level of head motion required to consume screen content (i.e., a ton of large head movements) would make the device very uncomfortable quite quickly.

    The Vision Pro has about ~35ppd and generally people seems to think it hits the bar for monitor replacement. Meta Quest 3 has ~25ppd and generally people seem to think it does not. The Steam Frame is specs-wise much closer to Quest 3 than Vision Pro.

    There are some software things you can do to increase legibility of details like text, but ultimately you do need physical pixels.

HackerNews