...

foota

7295

Karma

2014-10-04

Created

Recent Activity

  • The absl library has a great write up of time programming: https://abseil.io/docs/cpp/guides/time

  • Commented: "TPU Deep Dive"

    5. The efficient market hypothesis is true :-)

  • I wonder if you could make a frankenstein version of differential datalog by combing the OP repo with salsa[1] (the crate that powers rust-analyzer)

    [1] https://github.com/salsa-rs/salsa

  • I'm not sure, but it clearly wasn't sufficient if it does.

    I guess the issue here is if you're crash looping, once the task comes up it will generate load retrying to get the config, so even if you're no longer crash looping (and hence no longer backing off at borg) you're still causing overload.

    As long as the initial rate of tasks coming up is enough to cause overload, this will result in persisting the outage even once all tasks are up (assuming that the overload is sufficient to bring goodput of tasks becoming healthy to near zero).

    Interestingly you can read that one of the mitigations they applied was to fan out config reads to the multiregional mirrors of the database instead of just the regional us-central1 mirror, presumably the multi regional mirrors brought in significantly more capacity than just the regional mirrors, spreading the load.

    I'd be curious to know how much configuration they're loading that it caused such load.

  • Generally, even these emergency changes are done not entirely immediately to prevent a fix from making things worse. This is an operational choice though, not a technical limitation. My guess being involved in similar issues in the past is the ~15 minute delay preparing the change was either that it wasn't a normally used big red button, so it wasn't clear how to use it, or there was some other friction preparing the change.

HackerNews