We put Claude Code in Rollercoaster Tycoon

2026-01-1214:28526288labs.ramp.com

AI autonomously manages a theme park in the classic game RollerCoaster Tycoon, placing rides, fixing infrastructure, and generating CFO reports, all via command line.


Read the original article

Comments

  • By ninkendo 2026-01-1723:0218 reply

    Related:

    I’ve always found it crazy that my LLM has access to such terrible tools compared to mine.

    It’s left with grepping for function signatures, sending diffs for patching, and running `cat` to read all the code at once.

    I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc.

    Is anyone working on making it so LLM’s get better tools for actually writing/refactoring code? Or is there some “bitter lesson”-like thing that says effort is always better spent just increasing the context size and slurping up all the code at once?

    • By nbardy 2026-01-183:444 reply

      > Claude Code officially added native support for the Language Server Protocol (LSP) in version 2.0.74, released in December 2025.

      I think from training it's still biased towards simple tooling.

      But also, there is real power to simple tools, a small set of general purpose tools beats a bunch of narrow specific use case tools. It's easier for humans to use high level tools, but for LLM's they can instantly compose the low level tools for their use case and learn to generalize, it's like writing insane perl one liners is second nature for them compared to us.

      If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...

      If you think about it any time you use a tool there is probably a 20 line python program that is more fit to your use case, it's just that it would take you too long to write it, but for an LLM that's 0.5 seconds

      • By frumplestlatz 2026-01-1810:18

        > but for LLM's they can instantly compose the low level tools for their use case and learn to generalize

        Hard disagree; this wastes enormous amounts of tokens, and massively pollutes the context window. In addition to being a waste of resources (compute, money, time), this also significantly decreases their output quality. Manually combining painfully rudimentary tools to achieve simple, obvious things -- over and over and over -- is *not* an effective use of a human mind or an expensive LLM.

        Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.

        I've written[1] custom MCP servers to provide narrowly focused API search and code indexing, build system wrappers that filter all spurious noise and present only the material warnings and errors, "edit file" hooks that speculatively trigger builds before the LLM even has to ask for it, and a litany of other similar tools.

        Due to LLM's annoying tendency to fall back on inefficient shell scripting, I also had to write a full bash syntax parser and shell script rewriting ruleset engine to allow me to silently and trivially rewrite their shell invocations to more optimal forms that use the other tools I've written, so that they don't have to do expensive, wasteful things like pipe build output through `head`/`tail`/`grep`/etc, which results in them invariably missing important information, and either wandering off into the weeds, or -- if they notice -- consuming a huge number of turns (and time) re-running the commands to get what they need.

        Instead, they call build systems directly with arbitrary options, | filters, etc, and magically the command gets rewritten to something that will produce the ideal output they actually need, without eating more context and unnecessary turns.

        LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.

        [1] And by "I've written", I mean I had an LLM do it.

      • By forty 2026-01-1811:26

        Note that the Claude code LSP integration was actually broken for a while after it was released, so make sure you have a very recent version if you want to try it out.

        However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.

      • By cududa 2026-01-185:051 reply

        Correct. If you try to create a coding agent using the raw Codex or Claude code API and you build your own “write tool”, and don’t give the model their “native patch tool”, 70%+ of the time it’s write/ patch fails because it tries to do the operation using the write/ patch tool it was trained on.

        • By htrp 2026-01-1818:02

          part of the value add of owning both the model and the tooling

      • By cm2187 2026-01-189:011 reply

        We are back to RISC vs CISC!

        • By htrp 2026-01-1818:03

          history doesn't repeat but it definitely rhymes

    • By KronisLV 2026-01-180:316 reply

      > I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc

      I am so surprised that all of the AI tooling mostly revolves around VSC or its forks and that JetBrains seem to not really have done anything revolutionary in the space.

      With how good their refactoring and code inspection tools are, you’d really think they’d pass of that context information to AI models and that they’d be leaps and bounds ahead.

      • By harikb 2026-01-183:232 reply

        Recently, all these agents can talk LSP (language server protocol) so it should get better soon. That said, yeah they don't seem to default to use `ripgrep` when that is clearly better than `grep`

        • By virtualritz 2026-01-1818:06

          What you really want is ast-grep[1].

          Ripgrep is much faster than grep. But the result is not more concise and tokens are wasted.

          I think codex uses ast-grep by default, if installed; Claude has to be instructed?

          [1] https://ast-grep.github.io/

        • By wahnfrieden 2026-01-184:491 reply

          Codex likes to use ripgrep.

          • By je42 2026-01-187:26

            Claudes search tool uses Ripgrep , which is embedded in Claude.

      • By eek2121 2026-01-181:051 reply

        Are you? I'm not surprised at all, considering that the biggest investment juggernaut in AI is also the author of VSC. I wonder what the connection is? ;)

        • By eru 2026-01-182:532 reply

          Well, Google also has their own AIs and lots of money to throw around.

          • By tvink 2026-01-189:57

            Unfortunately the have abysmal design sense for TUI and an inability to recognize the good feature requests they are getting

          • By pjmlp 2026-01-1811:29

            And yet contrary to Microsoft and Apple, they outsource most of their main development tools.

            Go and Dart hardly get the love across their SDKs as Objective-C, Swift, C#, VB get on their owners.

            Same with IDE tooling, fully dependant on JetBrains and Microsoft.

      • By penneyd 2026-01-180:43

        Agreed - this seems like a no brainer, surely this is something that is being worked on.

      • By htrp 2026-01-1817:59

        Jetbrains is trying but I feel like they're very very behind in the space

      • By epicureanideal 2026-01-1816:57

        Claude and other LLMs can be used through JetBrains, and the IDE provides a significantly better experience than VS Code in my opinion.

      • By PlatoIsADisease 2026-01-1816:36

        I haven't seen JetBrains as 'great'. I think they have a strong marketing team that gets into universities and potentially astroturfs on the internet, but I have always found better tools for every language. Although, I can't remember what I ended up choosing for PHP.

    • By mulmboy 2026-01-182:21

      LLMs aren't like you or me. They can comprehend large quantities of code quickly and piece things together easily from scattered fragments. so go to reference etc become much less important. Of course though things change as the number of usages of a symbol becomes large but in most cases the LLM can just make perfect sense of things via grep.

      To provide it access to refactoring as a tool also risks confusing it via too many tools.

      It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.

    • By fragmede 2026-01-1723:191 reply

      Anthropic, for one.

      > Added LSP (Language Server Protocol) tool for code intelligence features like go-to-definition, find references, and hover documentation

      https://github.com/anthropics/claude-code/blob/main/CHANGELO...

      • By novaleaf 2026-01-1723:561 reply

        their c# LSP theoretically worked for a week or so (I never saw it in action though), but now it always errors on launch :(

        • By forty 2026-01-1811:27

          There was an issue in Claude code which is fixed in latest release

    • By hippo22 2026-01-1723:111 reply

      If you can read fast enough, grepping is probably faster than waiting for a compiler to tell you anything.

      • By gf000 2026-01-1723:202 reply

        Faster for worse results, though. Determining the source of a symbol is not as trivial as finding the same piece of text somewhere else, it should also reliably be able to differentiate among them. What better source for that then the compiler itself?

        • By ninkendo 2026-01-1723:27

          Yeah, especially for languages that make heavy use of type inference. There’s nothing you can really grep for most of the time… to really know “who’s using this code” you need to know what the compiler knows.

          An LLM can likely approach compiler-level knowledge just from being smart and understanding what it’s reading, but it costs a lot of context to do this. Giving the LLM access to what the compiler knows as an API seems like it’s a huge area for improvement.

        • By squirrellous 2026-01-183:32

          It depends on the language and codebase. For something very dynamic like Python it may be the case that grepping finds real references to a symbol that won’t be found by a language server. Also language servers may not work with cross-language interfaces or codegen situations as well as grep.

          OTOH for a giant monorepo, grep probably won’t work very well.

    • By fancy_pantser 2026-01-182:09

      Zed Editor gives the LLM tools that use the LSP as you'd expect as a normal IDE user, like "go to symbol definition" so it greps a lot less.

    • By selcuka 2026-01-1812:07

      JetBrain IDEs come with an MCP server that supports some refactoring tools [1]:

      > Starting with version 2025.2, IntelliJ IDEA comes with an integrated MCP server, allowing external clients such as Claude Desktop, Cursor, Codex, VS Code, and others to access tools provided by the IDE. This provides users with the ability to control and interact with JetBrains IDEs without leaving their application of choice.

      [1] https://www.jetbrains.com/help/idea/mcp-server.html#supporte...

    • By ricw 2026-01-1813:27

      Tidewave.ai does exactly that. It’s made Claude code so much more functional. It provides mcp servers to

      - search all your code efficiently - search all documentation for libraries - access your database and get real data samples (not just abstract data types) - allows you to select design components from your figma project and implements them for you - allows Claude to see what is rendered in the browser

      It’s basically the ide for your LLM client. It really closes the loop and has made Claude and myself so much more productive. Highly recommended and cheap at $10/month

      Ps: my personal opinion. I have Zero affiliation with them

    • By Wowfunhappy 2026-01-1815:261 reply

      LLMs operate on text. They can take in text, and they can produce text. Yes, some LLMs can also read and even produce images, but at least as of today, they are clearly much better at using text[1].

      So cat, ripgrep, etc are the right tools for them. They need a command line, not a GUI.

      1: Maybe you'd argue that Nano Banana is pretty good. But would you say its prompt adherence is good enough to produce, say, a working Scratch program?

      • By kelipso 2026-01-1815:391 reply

        Inputs to functions are text, as in variables, or file names, directory names, symbol names with symbol searching. Outputs you get from these functions for things like symbol searching is text too, or at least easily reformatted to text. Like API calls are all just text input and output.

        • By Wowfunhappy 2026-01-1819:44

          Yes, and I frequently see Claude Code start with tools that retrieve these things when it's doing work. What are you surprised it isn't using?

    • By JimDabell 2026-01-180:49

      Kit looks like a good step in this direction:

      https://github.com/cased/kit

    • By girvo 2026-01-185:241 reply

      You can give agents the ability to check VSCode Diagnostics, LSP servers and the like.

      But they constantly ignore them and use their base CLI tools instead, it drives me batty. No matter what I put in AGENTS.md or similar, they always just ignore the more advanced tooling IME.

      • By worksonmine 2026-01-1812:071 reply

        Doesn't have to be a bad thing, not all languages have good LSP support. If the AI can optimize for simple cross-language tools it won't be as dependent on the LSP implementation.

        I used grep and simple ctags to program in vanilla vim for years. It can be more useful than you'd think. I do like the LSP in Neovim and use it a lot, but I don't need it.

        • By girvo 2026-01-1812:37

          I also lived in ctags land, but gosh I don’t miss it. LSPs are a step change, and most languages do have either an actual implementation or something similar enough that’s still more powerful than bare strings.

          It’s faster, too, as the model doesn’t need to scan for info, but again it really likes to try not to use it.

          Of course I still use rg and fd to traverse things, cli tools are powerful. I just wish LLMs could be made to use more powerful tools reliably!

    • By hahahahhaah 2026-01-1723:121 reply

      An LSP MCP?

      • By ninkendo 2026-01-1723:241 reply

        Yeah, or something even smarter than that.

        If you are willing to go language-specific, the tooling can be incredibly rich if you go through the effort. I’ve written some rust compiler drivers for domain-specific use cases, and you can hook into phases of the compiler where you have amazingly detailed context about every symbol in the code. All manner of type metadata, locations where values are dropped, everything is annotated with spans of source locations too. It seems like a worthy effort to index all of it and make it available behind a standard query interface the LLM can use. You can even write code this way, I think rustfmt hooks into the same pipeline to produce formatted code.

        I’ve always wished there were richer tools available to do what my IDE already does, but without needing to use the UI. Make it a standard API or even just CLI, and free it from the dependency on my IDE. It’d be very worth looking into I think.

        • By quantummagic 2026-01-180:341 reply

          If the compiler just dumped all that data out as structured text, you could use current LLMs to swallow it in a single gulp.

          • By ninkendo 2026-01-1817:58

            Well the point is to avoid them needing to swallow it in a single gulp… after all, the source code is already all the information you need to get all this metadata.

            The use cases I have in mind are for codebases with many millions of lines of code, where just dumping it all into the context is unreasonably expensive. In these scenarios, it’d be beneficial to give the LLM a sort of SQL-like language it can use to prod at the code base in small chunks.

            In fact I keep thinking of SQL as an example in my head, but maybe it’s best to take it literally: why don’t we have a SQL for source code? Why can’t I do “select function.name from functions where parameters contains …” or similar (with clever subselects, joins, etc) to get back whatever exists in the code?

            It’s something I always wanted in general, not just for LLM’s. But LLM’s could make excellent use of it if there’s simply not enough context size to reasonably slurp up all the code.

    • By rudedogg 2026-01-180:032 reply

      LSP also kind of sucks. But the problem is all the big companies want big valuations, so they only chase generic solutions. That's why everything is a VS Code clone, etc..

      https://paulgraham.com/ds.html

      • By dexwiz 2026-01-183:16

        I've never used an LSP plugin half as good as a JetBrains IDE.

      • By immibis 2026-01-185:04

        Always wondered what happened to the era of IDEs actually knowing the language you're using.

    • By ramraj07 2026-01-181:111 reply

      Not coding agents but we do a lot of work trying to find the best tools, and the result is always that the simplest possible general tool that can get the job done always beats a suite of complicated tools and rules on how to use them.

      • By eru 2026-01-182:551 reply

        Well, jump to definition isn't exactly complicated?

        And you can use whatever interface the language servers already use to expose that functionality to eg vscode?

        • By jhasse 2026-01-189:191 reply

          It can be: What definition to jump to if there are multiple (e.g. multiple Translation Units)? What if the function is overloaded and none of the types match?

          With grep it's easy: Always shows everything that matches.

          • By eru 2026-01-1812:12

            Sure, there might be multiple definitions to jump to.

            With grep you get lots of false positives, and for some languages you need a lot of extra rules to know what to grep for. (Eg in Python you might read `+` in the caller, but you actually need to grep for __add__ to find the definition.)

    • By elif 2026-01-1817:29

      Surely there is an embedding for emacs giving it full elisp control

    • By BryantD 2026-01-181:16

      This isn’t completely the answer to what you want but skills do open a lot of doors here. Anything you can do on a command line can turn into a skill, after all.

    • By karlgkk 2026-01-182:55

      I’ve been saying this for a while. CPU demand is about to go through the roof.

      I think about it, to get these tools to be most effective you have to be able to page things in and out of their context windows.

      What was once a couple of queries is now gonna be dozens or hundreds or even more from the LLM

      For code that means querying the AST and query it in a way that allows you to limit the results of the output

      I wonder which SAST vendor Anthropic will buy.

    • By throwawaygo 2026-01-192:18

      Workin on it

  • By Jaysobel 2026-01-1719:563 reply

    • By theptip 2026-01-1720:154 reply

      Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus).

      • By cheema33 2026-01-1723:101 reply

        Computer use and screenshots are context intensive. Text is not. The more context you give to an LLM, the dumber it gets. Some people think at 40% context utilization, the LLM starts to get into the dumb zone. That is where the limitations are as of today. This is why CLI based tools like Claude Code are so good. And any attempt at computer use has fallen by the wayside.

        There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.

        • By fragmede 2026-01-1723:371 reply

          > And any attempt at computer use has fallen by the wayside.

          You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)

          https://claude.com/blog/cowork-research-preview

          https://news.ycombinator.com/item?id=46593022

          More to the point though, you should be using Agents in Claude Code to limit context pollution. Agents run with their own context, and then only return salient details. Eg, I have an Agent to run "make" and return the return status and just the first error message if there is one. This means the hundreds/thousands of lines of compilation don't pollute the main Claude Code context, letting me get more builds in before I run out of context there.

          • By cheema33 2026-01-259:41

            >> And any attempt at computer use has fallen by the wayside.

            > You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)

            Claude Cowork does not do "computer use" in the traditional sense. e.g. it cannot use your computer to drive the interface of Adobe Premiere. It is not taking screenshots of your computer desktop, like a traditional "Computer use" product does.

      • By Jaysobel 2026-01-1721:00

        I had tried the browser screenshotting feature for agents in Cursor and found it wasn't very reliable - screenshots eat a lot of context, and the agent didn't have a good sense for when to use them. I didn't try it in this project. I bet it would work in some specific cases.

      • By nanapipirara 2026-01-1721:49

        Claude helped me immensely getting an image converter to work. Giving it screenshots of wrong output (lots of layers had an unpredictable offsets that was not supposed to be there) and output as I expected it helped Claude understand the problems and it fixed the bugs immediately.

      • By deepl_y 2026-01-191:50

        I'm not sure if this proves anything, but i saw this article of Opus playign pokemon, and here they were given actual screenshots, and it still says it navigated visual space pretty poorly despite the advancements https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...

    • By cheschire 2026-01-1811:531 reply

      Did you intend the last link to link to your project? It’s a copy of the OpenRCT2 project.

    • By fragmede 2026-01-1723:32

      > Claude is at a pretty steep visuo-spatial disadvantage,

      How hard would it be to use with OpenAI's offerings instead? Particularly, imo, OpenAI's better at "looking" at pictures than Claude.

  • By rashidae 2026-01-1719:23

    > As a mirror to real-world agent design: the limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces. For this reason, we prefer to think of agents as automating diligence, rather than intelligence, for operational challenges.

HackerNews