A log scale is actually appropriate in this context from a first-principles perspective. Per scaling laws (and also general behavior of epsilon-probability of failure multiplied N times), you would generally expect more vs. less effective techniques to have multiplicatively greater or fewer steps until failure, not additively greater/fewer. Figure 1 is comical, but the appendix figure is the more scientifically appropriate one.
Very obviously missing the mundane agentic work. I think the following things are basically already solved, and are just waiting for the right harness:
- Call this government service center, wait on hold for 45 minutes, then when they finally answer, tell them to reactivate my insurance marketplace account that got wrongly deleted.
- Find a good dentist within 2mi from my house, call them to make sure they take my insurance, and book an appointment sometime in the next two weeks no earlier than 11am
- Figure out how I'm going to get from Baltimore to Boston next Thursday, here's $100 and if you need more, ask me.
- I want to apply a posterizing filter in photoshop, take control of my mouse for the next 10sec and show me where it is in the menu
- Call that gym I never go to and cancel my membership