If AI writes code, should the session be part of the commit?

2026-03-020:27497393github.com

Keep track of you codex sessions per commit. Contribute to mandel-macaque/memento development by creating an account on GitHub.

Show article

git-memento is a Git extension that records the AI coding session used to produce a commit.

It runs a commit and then stores a cleaned markdown conversation as a git note on the new commit.

Create commits with normal Git flow (-m or editor).
Attach the AI session trace to the commit (git notes).
Keep provider support extensible (Codex first, others later).
Produce human-readable markdown notes.

Initialize per-repository memento settings:

git memento init
git memento init codex
git memento init claude

init stores configuration in local git metadata (.git/config) under memento.*.

git memento commit <session-id> -m "Normal commit message"
git memento commit <session-id> -m "Subject line" -m "Body paragraph"
git memento amend -m "Amended subject"
git memento amend <new-session-id> -m "Amended subject" -m "Amended body"
git memento audit --range main..HEAD --strict
git memento doctor

Or:

git memento commit <session-id>

You can pass -m multiple times, and each value is forwarded to git commit in order. When -m is omitted, git commit opens your default editor.

amend runs git commit --amend.

Without a session id, it copies the note(s) from the previous HEAD onto the amended commit.
With a session id, it copies previous note(s) and appends the new fetched session as an additional session entry.
A single commit note can contain sessions from different AI providers.

Share notes with the repository remote (default: origin):

git memento share-notes
git memento share-notes upstream

This pushes refs/notes/* and configures local remote.<name>.fetch so notes can be fetched by teammates.

Push your branch and sync notes to the same remote in one command (default: origin):

git memento push
git memento push upstream

This runs git push <remote> and then performs the same notes sync as share-notes.

Sync and merge notes from a remote safely (default remote: origin, default strategy: cat_sort_uniq):

git memento notes-sync
git memento notes-sync upstream
git memento notes-sync upstream --strategy union

This command:

Ensures notes fetch mapping is configured.
Creates a backup ref under refs/notes/memento-backups/<timestamp>.
Fetches remote notes into refs/notes/remote/<remote>/*.
Merges remote notes into local notes and pushes synced notes back to the remote.

Configure automatic note carry-over for rewritten commits (rebase / commit --amend):

git memento notes-rewrite-setup

This sets local git config:

notes.rewriteRef=refs/notes/commits
notes.rewriteMode=concatenate
notes.rewrite.rebase=true
notes.rewrite.amend=true

Carry notes from a rewritten range (for squash/rewrite flows) onto a new target commit:

git memento notes-carry --onto <new-commit> --from-range <base>..<head>

This reads notes from commits in <base>..<head> and appends a provenance block to <new-commit>.

Audit note coverage and note metadata in a commit range:

git memento audit --range main..HEAD
git memento audit --range origin/main..HEAD --strict --format json

Reports commits with missing notes (missing-note <sha>).
Validates note metadata markers (- Provider: and - Session ID:).
In --strict mode, invalid note structure fails the command.

Run repository diagnostics for provider config, notes refs, and remote sync posture:

git memento doctor
git memento doctor upstream --format json

Show command help:

Show installed tool version (major.minor + commit metadata when available):

Provider defaults can come from env vars, and init persists the selected provider + values in local git config:

MEMENTO_AI_PROVIDER (default: codex)
MEMENTO_CODEX_BIN (default: codex)
MEMENTO_CODEX_GET_ARGS (default: sessions get {id} --json)
MEMENTO_CODEX_LIST_ARGS (default: sessions list --json)
MEMENTO_CLAUDE_BIN (default: claude)
MEMENTO_CLAUDE_GET_ARGS (default: sessions get {id} --json)
MEMENTO_CLAUDE_LIST_ARGS (default: sessions list --json)

Set MEMENTO_AI_PROVIDER=claude to use Claude Code.

Runtime behavior:

If the repository is not configured yet, commit, amend <session-id>, push, share-notes, notes-sync, notes-rewrite-setup, and notes-carry fail with a message to run git memento init first.
Stored git metadata keys include:
- memento.provider
- memento.codex.bin, memento.codex.getArgs, memento.codex.listArgs
- memento.claude.bin, memento.claude.getArgs, memento.claude.listArgs

If a session id is not found, git-memento asks Codex for available sessions and prints them.

Requires .NET SDK 10 and native toolchain dependencies for NativeAOT.

dotnet publish src/GitMemento.Cli/GitMemento.Cli.fsproj -c Release -r osx-arm64 -p:PublishAot=true

dotnet publish src/GitMemento.Cli/GitMemento.Cli.fsproj -c Release -r linux-x64 -p:PublishAot=true

dotnet publish src/GitMemento.Cli/GitMemento.Cli.fsproj -c Release -r win-x64 -p:PublishAot=true

Git discovers commands named git-<name> in PATH.

Publish for your platform.
Copy the produced executable to a directory in your PATH.
Ensure the binary name is git-memento (or git-memento.exe on Windows).

Then run:

git memento commit <session-id> -m "message"

Install from latest GitHub release:

curl -fsSL https://raw.githubusercontent.com/mandel-macaque/memento/main/install.sh | sh

Release assets are built with NativeAOT (PublishAot=true) and packaged as a single executable per platform.
If the workflow runs from a tag push (for example v1.2.3), that tag is used as the GitHub release tag/name.
If the workflow runs from main without a tag, the release tag becomes <Version>-<shortSha> (for example 1.0.0-a1b2c3d4).
install.sh always downloads from releases/latest, so the installer follows the latest published GitHub release.

CI runs install smoke tests on Linux, macOS, and Windows that verify:

install.sh downloads the latest release asset for the current OS/architecture.
The binary is installed for the current user into the configured install directory.
git memento --version and git memento help both execute after installation.

dotnet test GitMemento.slnx
npm run test:js

This repository includes a reusable marketplace action with two modes:

mode: comment (default): reads git notes created by git-memento and posts a commit comment.
mode: gate: runs git memento audit as a CI gate and fails if note coverage checks fail. git-memento must already be installed in the job.

Action definition:

action.yml at repository root.
install/action.yml for reusable git-memento installation.
Renderer source: src/note-comment-renderer.ts
Runtime artifact committed for marketplace consumers: dist/note-comment-renderer.js

Example workflow:

name: memento-note-comments on: push: pull_request: types: [opened, synchronize, reopened] permissions: contents: write pull-requests: read jobs: comment-memento-notes: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: mandel-macaque/memento@v1 with: mode: comment github-token: ${{ secrets.GITHUB_TOKEN }}

Inputs:

github-token (default: ${{ github.token }})
mode (default: comment) - comment or gate
notes-fetch-refspec (default: refs/notes/*:refs/notes/*)
max-comment-length (default: 65000)
audit-range (optional, gate mode)
base-ref (optional, gate mode pull request inference)
strict (default: true, gate mode)

Installer action inputs:

memento-repo (default: mandel-macaque/memento, release asset source)
install-dir (default: ${{ runner.temp }}/git-memento-bin)
verify (default: true)

CI gate example:

name: memento-note-gate on: pull_request: types: [opened, synchronize, reopened] permissions: contents: read jobs: enforce-memento-notes: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: mandel-macaque/memento/install@v1 with: memento-repo: mandel-macaque/memento - uses: mandel-macaque/memento@v1 with: mode: gate strict: "true"

Installer action example:

- uses: mandel-macaque/memento/install@v1 with: memento-repo: mandel-macaque/memento

Local workflow in this repository:

.github/workflows/memento-note-comments.yml
.github/workflows/memento-note-gate.yml

Build and commit the action renderer artifact:

npm ci
npm run build:action
git add src/note-comment-renderer.ts dist/note-comment-renderer.js

Ensure action.yml and install/action.yml are in the default branch and README documents usage.
Create and push a semantic version tag:

git tag -a v1.0.0 -m "Release GitHub Action v1.0.0"
git push origin v1.0.0
git tag -f v1 v1.0.0
git push -f origin v1

In GitHub, open your repository page:
- Releases -> Draft a new release -> choose v1.0.0 -> publish.
Open the Marketplace (GitHub Store) publishing flow from the repository and submit listing metadata.
Keep the major tag (v1) updated to the latest compatible release.

Notes are written with git notes add -f -m "<markdown>" <commit-hash>.
Multi-session notes use explicit delimiters:
- 
- 
- 
- 
Legacy single-session notes remain supported and are upgraded to the versioned multi-session envelope when amend needs to append a new session.
Conversation markdown labels user messages with your git alias (git config user.name) and assistant messages with provider name.
Serilog debug logs are enabled in DEBUG builds.

Read the original article

Comments

By jedberg 2026-03-026:4421 reply

The way I write code with AI is that I start with a project.md file, where I describe what I want done. I then ask it to make a plan.md file from that project.md to describe the changes it will make (or what it will create if Greenfield).

I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.

Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.

I then commit the project.md and plan.md along with the code.

So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.

The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.

By jumploops 2026-03-028:035 reply

I do something similar, but across three doc types: design, plan, and debug

Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.

Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).

The plan docs are structured like `plan/[feature]/phase-N-[description].md`

From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.

At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

We review these hypotheses, sometimes iterate, and then tackle them one by one.

An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.

Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.

Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

By miki123211 2026-03-029:58

My "heavy" workflow for large changes is basically as follows:

0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.

1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.

2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.

3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.

4. Given the functional spec, figure out the technical architecture, same workflow as above.

5. High-level plan.

6. Detailed plan for the first incomplete high-level step.

7. Implement, manually review code until satisfied.

8. Go to 6.

By jedberg 2026-03-028:282 reply

> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.

You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.

We just dropped a blog post about it: https://www.dbos.dev/blog/mcp-agent-for-durable-workflows

By zknill 2026-03-029:201 reply

Why an MCP? dbos already ships a cli that appears to have the same features. Why an MCP over a skill that gives context on using the cli?

https://docs.dbos.dev/python/reference/cli

By jumploops 2026-03-029:51

> we launched both a skill and MCP server.

My guess is that the MCP was easy enough to add, and some tools only support MCP.

Personal opinion: MCP is just codified context pollution.

By jumploops 2026-03-028:411 reply

This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.

With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.

As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.

[0]https://github.com/jumploops/slop.haus/tree/main/debug

By nubinetwork 2026-03-0210:43

> giving agents access to logs (dev or prod) tightens the debug flow substantially.

Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...

By danenania 2026-03-0216:16

I have a similar process and have thought about committing all the planning files, but I've found that they tend to end up in an outdated state by the time the implementation is done.

Better imo is to produce a README or dev-facing doc at the end that distills all the planning and implementation into a final authoritative overview. This is easier for both humans and agents to digest than bunch of meandering planning files.

By frumiousirc 2026-03-0211:51

> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.

Your current file-system "UI" vs Beads command line UI is obviously a big difference.

Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".

By wek 2026-03-0214:21

Similar, but we have the agent write the test cases after writing the plan and then iterate until it passes the test cases.

By frank00001 2026-03-027:246 reply

Sounds like the spec driven approach. You should take a look at this https://github.com/github/spec-kit

By kriro 2026-03-0212:05

I basically use a spec driven approach except I only let Github Spec Kit create the initial md file templates and then fill them myself instead of letting the agent do it. Saves a ton of tokens and is reasonably quick and I actually know I wrote the specs myself and it contains what I want. After I'm happy with the md file "harness" I let the agents loose.

The most frustrating issues that pop up are usually library/API conflicts. I work with Gymnasium or PettingZoo and Rlib or stablebaselines3. The APIs are constantly out of sync so it helps to have a working environment were libraries and APIs are in sync beforehand.

By jedberg 2026-03-028:25

Sort of, depending on if your spec includes technology specifics.

For example it might generate a plan that says "I will use library xyz", and I'll add a comment like "use library abc instead" and then tell it to update the plan, which now includes specific technology choices.

It's more like a plan I'd review with a junior engineer.

I'll check out that repo, it might at least give me some good ideas on some other default files I should be generating.

By shinycode 2026-03-028:00

Thanks for the link ! I’m very curious about their choices and methods, I’ll try it

By wolletd 2026-03-027:361 reply

> 110 releases in 6 months

By sethammons 2026-03-0210:52

Almost a release per work day, esp. if you count standard holidays.

By WXLCKNO 2026-03-0216:04

or OpenSpec https://github.com/Fission-AI/OpenSpec/

I think it's much better

By malloryerik 2026-03-029:52

Have you tried this? Review?

By dmd 2026-03-0212:36

https://github.com/obra/superpowers "brainstorming" is pretty much exactly this workflow, and it's great.

By shinycode 2026-03-027:58

I also do that and it works quite well to iterate on spec md files first. When every step is detailed and clear and all md files linked to a master plan that Claude code reads and updates at every step it helps a lot to keep it on guard rails. Claude code only works well on small increments because context switching makes it mix and invent stuff. So working by increments makes it really easy to commit a clean session and I ask it to give me the next prompt from the specs before I clear context. It always go sideways at some point but having a nice structure helps even myself to do clean reviews and avoid 2h sessions that I have to throw away. Really easier to adjust only what’s wrong at each step. It works surprisingly well

By nesarkvechnep 2026-03-0217:201 reply

By that time you would’ve written the code yourself, only better.

By cortesoft 2026-03-0217:491 reply

I am sure this is partly tongue in cheek, but no, you can’t have written the code yourself in that amount of time. Would the code be better if you wrote it? Probably, depending on your coding skills.

But it would not be faster.

OP is talking about creating an entire project, from scratch, and having it feature complete at the end.

By smohare 2026-03-0220:59

[dead]

By anbende 2026-03-0214:201 reply

Here’s how I do the same thing, just with a slightly different wrapper: I’m running my own stepwise runtime where agents are plugged into defined slots.

I’ll usually work out the big decisions in a chat pane (sometimes a couple panes) until I’ve got a solid foundation: general guidelines, contracts, schemas, and a deterministic spec that’s clear enough to execute without interpretation.

From there, the runtime runs a job. My current code-gen flow looks like this: 1. Sync the current build map + policies into CLAUDE|COPILOT.md 2. Create a fresh feature branch 3. Run an agent in “dangerous mode,” but restricted to that branch (and explicitly no git commands) 4. Run the same agent again—or a different one—another 1–2 times to catch drift, mistakes, or missed edge cases 5. Finish with a run report (a simple model pass over the spec + the patch) and keep all intermediate outputs inspectable

And at the end, I include a final step that says: “Inspect the whole run and suggest improvements to COPILOT.md or the spec runner package.” That recommendation shows up in the report, so the system gets a little better each iteration instead of just producing code.

I keep tweaking the spec format, agent.md instructions and job steps so my velocity improves over time.

--- To answer the original article's question. I keep all the run records including the llm reasoning and output in the run record in a separate store, but it could be in repo also. I just have too many repos and want it all in one place.

By CompoundLoop 2026-03-0217:461 reply

What store do you use for your run records? A separate git repo? or do you have some SQL lite db holding the records.

By anbende 2026-03-0218:27

Hi there. Right now they are going to a separate git repo, yes. Like this:

local-governor/epics/e-epics/e014-clinical-domain-model/runs/run-e014-01-ops-catalog-20260302-173907-244c82

- Attempts

+ Steps

  - Step 1

  - Step 2

  - ...

  - Step 13

job_def.yaml

job_instance.json

changes_final.patch

run_report.md

improvement_suggestions.md

local-governor is my store for epics, specs, run records, schemas, contracts, etc. No logic, just files. I want all this stuff in a DB, but it's easier to just drop a file path into my spec runner or into a chat window (vscode chat or cli tool), but I'm tinkering with an alt version on a cloud DB that just projects to local files... shrug. I spend about as much time on tooling as actual features :)

By RHSeeger 2026-03-0215:20

I do something similar - A full work description in markdown (including pointers to tickets, etc); but not in a file - A "context" markdown file that I have it create once the plan is complete... that contains "everything important that it would need to regenerate the plan" - A "plan" markdown file that I have it create once the plan is complete

The "context" file is because, sometimes, it turns out the plan was totally wrong and I want to purge the changes locally and start over; discussing what was done wrong with it; it gives a good starting point. That being said, since I came up with the idea for this (from an experience it would have been useful and I did not have it) I haven't had an experience where I needed it. So I don't know how useful it really is.

None of that ^ goes into the repo though; mostly because I don't have a good place to put it. I like the idea though, so I may discuss it with my team. I don't like the idea of hundreds of such files winding up in the main branch, so I'm not sure what the right approach is. Thank you for the idea to look into it, though.

Edit: If you don't mind going into it, where do you put the task-specific md files into your repo, presumably in a way that doesn't stack of over time and cause ... noise?

By giancarlostoro 2026-03-0219:44

This is how I used to use Beads before I made GuardRails[0]. I basically iterate with the model, ask it to do market research, review everything it suggests, and you wind up with a "prompt" that tells it what to do and how to work that was designed by the model using its own known verbiage. Having learned about how XML could be used to influence Claude I'm rethinking my flow and how GuardRails behaves.

[0]: https://giancarlostoro.com/introducing-guardrails-a-new-codi...

By adam_patarino 2026-03-0213:441 reply

You check the plan files into git? Don’t you end up with dozens of md files?

I’ve been copying and pasting the plan into the linear issue or PR to save it, but keep my codebase clean.

By thearn4 2026-03-0214:59

Yeah I had the same question. I suppose you could put the project+plan text into the commit message?

By 8note 2026-03-0219:42

the real question is when peer feedback and review happens.

is making the project file collaborative between multiple engineers? the plan file?

ive tried some variants of sharing different parts but it feels like ots almost water effort if the LLM then still goes through multiple iterations to get whats right, the oroginal plan and project gets lost a bit against the details of what happened in the resulting chat

By the-grump 2026-03-027:192 reply

Stealing this brilliant idea. Thank you for sharing!

By jedberg 2026-03-028:23

I wish I could say I came up with it, but it's just a small variation on something I saw here on HN!

By peyton 2026-03-027:32

For big tasks you can run the plan.md’s TODOs through 5.2 pro and tell it to write out a prompt for xyz model. It’ll usually greatly expand the input. Presumably it knows all the tricks that’ve been written for prompting various models.

By winwang 2026-03-0216:26

Interesting! I actually split up larger goals into two plan files: one detailed plan for design, and one "exec plan" which is effectively a build graph but the nodes are individual agents and what they should do. I throw the two-plan-file thing into a protocol md file along with a code/review loop.

By odiroot 2026-03-0216:59

How do you use your agent effectively for executing such projects in bigger brownfield codebases? It's always a balance between the agent going way too far into NIH vs burning loads and loads of tokens for the initial introspection.

By matkoniecz 2026-03-0212:08

I do the same, but put it as a comment on top of generated file.

(So far I have not used LLMs to generate code larger than fitting in one file.)

Overall idea is that I modify and tweak prompt, and keep starting new LLM sessions and dispose of old ones.

By stackghost 2026-03-026:493 reply

>I then iterate on that plan.md with the AI until it's what I want.

Which tools/interface are you using for this? Opencode/claude code? Gas town?

By StrangeSound 2026-03-026:511 reply

I find that Antigravity is really good for this. You can comment on the plan documents in-line.

By d1sxeyes 2026-03-0211:43

Best feature of Antigravity

By anshumankmr 2026-03-027:29

While I have not commited my personal mind map, I just had Claude Code write it down for me. Plus I have a small Claude.MD, copilots-innstructions.md that are mentioning the various intricacies of what I am working on so the agent knows to refer to that file.

By jedberg 2026-03-028:23

I'm using the Claude desktop app and vi at the moment. But honestly I would probably do better with a more modern editor with native markdown support, since that's mostly what I'm writing now.

By tlb 2026-03-0212:24

Do you clear the file and use the same name for the next commit? Or create a new directory with a plan.md for each set of changes?

By fhub 2026-03-028:291 reply

I do something similar but I get Claude to review Codex every step of the way and feed it back (or visa versa depending on day)

By jedberg 2026-03-028:34

My next step was to add in having another LLM review Claude's plans. With a few markdown artifacts it should be easy for the other LLM to figure it out and make suggestions.

By vorticalbox 2026-03-0211:46

By iainmck29 2026-03-0212:121 reply

is this not what entire.io is doing? Was founded by the old Github CEO Thomas Dohmke

By plsft 2026-03-0215:431 reply

Yes, when I first saw this, its exactly what I thought of.

By moderation 2026-03-0216:221 reply

No mention of Agent Trace [0] yet. Interestingly, Entire are not supporting Agent Trace [1]

0. https://agent-trace.dev/

1. https://github.com/entireio/cli/issues/386

By esafak 2026-03-0217:20

Their response seems reasonable.

By Bombthecat 2026-03-0212:51

Then you might like to look into automaker.

By ryanmcl 2026-03-0214:53

[dead]

By 827a 2026-03-024:4912 reply

IMO: This might be a contrarian opinion, but I don't think so. Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit. The answer to this granularity is, much like anything, you have to think of the audience: Who is served by persisting these sessions? I would suspect that there is little reason why future engineers, or future LLMs, would need access to them; they likely contain a significant amount of noise, incorrect implementations, and red herrings. The product of the session is what matters.

I do think there's more value in ensuring that the initial spec, or the "first prompt" (which IME is usually much bigger and tries to get 80% of the way there) is stored. And, maybe part of the product is an LLM summary of that spec, the changes we made to the spec within the session, and a summary of what is built. But... that could be the commit message? Or just in a markdown file. Or in Notion or whatever.

By arppacket 2026-03-026:398 reply

While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.

We could have LLMs ingest all these historical sessions, and use them as context for the current session. Basically treat the current session as an extension of a much, much longer previous session.

Plus, future models might be able to "understand" the limitations of current models, and use the historical session info to identity where the generated code could have deviated from user intention. That might be useful for generating code, or just more efficient analysis by focusing on possible "hotspots", etc.

Basically, it's high time we start capturing any and all human input for future models, especially open source model development, because I'm sure the companies already have a bunch of this kind of data.

By woctordho 2026-03-028:30

That's exactly one of the reasons I've been archiving the sessions using DataClaw. The sessions can contain more useful information than the comments for humans.

[0] https://github.com/peteromallet/dataclaw

By staticassertion 2026-03-029:561 reply

TBH I don't think it's worth the context space to do this. I'm skeptical that this would have any meaningful benefits vs just investing in targeted docs, skills, etc.

I already keep a "benchmarks.md" file to track commits and benchmark results + what did/ did not work. I think that's far more concise and helpful than the massive context that was used to get there. And it's useful for a human to read, which I think is good. I prefer things remain maximally beneficial to both humans and AI - disconnects seem to be problematic.

By arppacket 2026-03-0218:351 reply

Might not be worth it now, but might be in future. Not just for future LLMs, but future AI architectures.

I don't think the current transformers architecture is the final stop in the architectural breakthroughs we need for "AGI" that mimics human thought process. We've gone through RNN, LSTM, Mamba, Transformers, with an exponentially increasing amounts of data over the years. If we want to use similar "copy human sequences" approaches all the way to AGI, we need to continuously record human thoughts, so to speak (and yes, that makes me really queasy).

So, persisting the session, that's already available in a convenient form for AI, is also about capturing the human reasoning process during the session, and the sometimes inherent heuristics therein. I agree that it's not really useful for humans to read.

By staticassertion 2026-03-0220:211 reply

I just don't really see the point in hedging like that tbh. I think you could justify almost anything on "it could be useful", but why pay the cost now? Eh.

By aiisjustanif 2026-03-0321:14

Optimizing and over-engineering to soon has gone out the window

By JeremyNT 2026-03-0213:05

But AI can just read the diff. The natural language isn't important.

By serial_dev 2026-03-0210:48

Or just "write a good commit message based on our session, pls", then both humans and llms can use it.

By ZeroGravitas 2026-03-028:151 reply

Similarly, git logs of existing human code seem to be a good source of info that llms don't look at unless explicitly prompted to do so.

By arppacket 2026-03-0216:57

Right now, it might not be worth the cost. That might change in future so that they consider it by default?

By JustFinishedBSG 2026-03-0211:26

> While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.

Context rot is very much a thing. May still be for future agents. Dumping tens/hundreds of thousand of trash tokens into context very much worsen the performance of the agent

By nsonha 2026-03-0216:12

It's just noise for AI too. There is no reason to be lazy with context management when you can simply ask the AI to write the summary of the session. But even that is hardly useful when AI can just read the source of truth which is the code and committed docs

By jfoster 2026-03-028:15

Future AIs can probably infer the requirements better than humans can write them.

By eru 2026-03-025:182 reply

> Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit.

Hmm, I think that's the wrong comparison? The more useful comparison might be: should all your notes you made and dead ends you tried become part of the commit?

By panarky 2026-03-027:153 reply

When a human writes the code should all their slack messages about the project be committed into the repo?

By fragmede 2026-03-027:58

That would be amazing! In the moment, it's a lot of noise, but say you're trying to figure out a bit of code that Greg wrote four years ago and oh btw he's no longer with the company. Having access to his emails and slack would be amazing context to try reverse engineer and figure out whytf he did what he did. Did he just pick a thing and run with it, so I can replace it and not worry about it, or was it a very intentional choice and do not replace, because everything else will break?

By blharr 2026-03-027:32

Ideally, yes? Or a reference ticket number pointing to that discussion

The main limitation is the human effort to compile that information, but if the LLM already has the transcript ready, its free

By woctordho 2026-03-028:332 reply

Ideally, yes. Although Slack is a vendor lock-in and we need a better platform to archive the sessions.

By woctordho 2026-03-0215:16

Here is a recent example that lack of archived discussion causes problem: https://github.com/triton-lang/triton/issues/9539

By eru 2026-03-033:55

You can scrape slack when you make the commit, and archive that.

By mocamoca 2026-03-027:241 reply

In some cases this is what I ask from my juniors. Not for every commit, but during some specific reviews. The goal is to coach them on why and how they got a specific result.

By adithyassekhar 2026-03-028:161 reply

What is a junior? I don't see it in claude.

By kubanczyk 2026-03-0215:191 reply

It's how a middle manager can improve its standing, so the Junior will be a thing in bigger orgs for quite a while.

By adithyassekhar 2026-03-036:021 reply

Both the manager and junior are a cost center for the company tbh if there are fewer employees to manage. Already seeing it here on this side of the pond: https://www.reddit.com/r/developersIndia/comments/1rinv3z/ju...

By kubanczyk 2026-03-0311:49

Companies (C-suites) do not actually want for their worker pool (humans + agents) to stay constant in time, there is no reason for it to stay constant in time. C-suites have very different worries.

And "cost center" is a lie from Outsourcing Era, forget about it.

By rzerowan 2026-03-025:265 reply

This is a central problem that weve already seen proliferate wildly in Scientific research , and currently if the same is allowed to be embedded in foundational code. The future outlook would be grim.

Replication crisis[1].

Given initial conditions and even accounting for 'noise' would a LLm arrive at the same output.It should , for the same reason math problems require one to show their working. Scientific papers require the methods and pseudocode while also requireing limitations to be stated.

Without similar guardrails , maintainance and extension of future code becomes a choose your own adventure.Where you have to guess at the intent and conditions of the LLM used.

[1] https://www.ipr.northwestern.edu/news/2024/an-existential-cr...

By 827a 2026-03-025:36

Agentic engineering is fundamentally different, not just because of the inherent unpredictability of LLMs, but also because there's a wildly good chance that two years from now Opus 4.6 will no longer even be a model anyone can use to write code with.

By majormajor 2026-03-026:032 reply

You can leave commit messages or comments without spamming your history with every "now I'm inspecting this file..." or "oops, that actually works differently than I expected" transcript.

In fact, I'd wager that all that excess noise would make it harder to discern meaningful things in the future than simply distilling the meaningful parts of the session into comments and commit messages.

By AlexCoventry 2026-03-026:38

IMO, you should do both. The cost of intellectual effort is dropping to zero, and getting an AI to scan through a transcript for relevant details is not going to cost much at all.

By devmor 2026-03-027:11

Those messages are part of the linguistic context used to generate the code, though. Don’t confuse them for when humans (or human written programs) display progress messages.

If they aren’t important for your specific purposes, you can summarize them with an LLM.

By JustFinishedBSG 2026-03-0211:30

> for the same reason math problems require one to show their working.

We don't put our transitional proofs in papers, only the final best one we have. So that analogy doesn't work.

For every proof in a paper there is probably 100 non-working / ugly sketches or just snippets of proofs that exist somewhere in a notebook or erased on a blackboard.

By veunes 2026-03-0212:01

Even if you pin the seed and spin up your own local LLM, changes to continuous batching at the vLLM level or just a different CUDA driver version will completely break your bitwise float convergence. Reproducibility in ML generation is a total myth, in prod we only work with the final output anyway

By itemize123 2026-03-025:361 reply

but we've been doing the same without llm. what're the new pieces which llm would bring in?

By rzerowan 2026-03-025:52

with normal practice , say if im reading through the linux source for a particular module.Id be able to refernce mailing lists and patchsets which by convention have to be human parsable/reviewable.Wit the history/comments/git blame etc putting in ones headspace the frame of reference that produced it.

By Muromec 2026-03-026:29

There is some potential value for the audit if you work in a special place where you are sworn in and where transparency is important, but who gonna read all of that and how do you even know that the transcript corresponds to the code if the committer is up to something

By solarkraft 2026-03-025:432 reply

I agree that probably not everything should be stored - it’s too noisy. But the reason the session is so interesting is precisely the later part of the conversation - all the corrections in the details, where the actual, more precise requirements crystallize.

By insin 2026-03-027:433 reply

AKA the code. You're all talking about the code.

By medstrom 2026-03-028:121 reply

The prompt is the code :) The code is like a compiled binary. How long until we put the prompts in `src/` and the code in `bin/`, I wonder...

By kubanczyk 2026-03-0215:42

I call out false dilemma. OP probably defines "code" as one of the languages precise enough to be suited for steering Turing machines. Thus, "code" is not the opposite of "prompt". They are apples and oranges.

Lawyers can code in English, but it is not to layperson's advantage, is it?

And for example, if you prompt for something to frobnicate biweekly, there is no intelligence today, and there will never be, to extract from it whether you want the Turing machine to act twice a week or one per two weeks. It's a deficiency of language, not of intelligence.

By solarkraft 2026-03-0211:23

Not at all, unless it contains very thorough reasoning comments (which arguably it should). The code is only an artifact, a lot of which is incidental and flexible. The prompts contain the actual constraints.

By whywhywhywhy 2026-03-029:56

People are trying to retain value as their value is being evaporated.

By slashdave 2026-03-026:052 reply

Then just summarize the final requirements

By solarkraft 2026-03-0211:28

That’s what I do! I think it works well and helps future agents a lot in understanding why the codebase is the way it is. I do have to oversee the commit messages, but it does avoid a lot of noise and maybe it’s a normal part of HITL development.

By lsaferite 2026-03-0212:14

If it's non-trivial work, have the Agent distill it down to an ADR.

By wickedsight 2026-03-028:07

> Who is served by persisting these sessions? I would suspect that there is little reason why future engineers, or future LLMs, would need access to them

I disagree. When working on legacy code, one of my biggest issues is usually the question 'why is this the way it is?' Devs hate documentation, Jira often isn't updated with decisions made during programming, so sometimes you just have to guess why 'wait(500)' or 'n = n - 1' are there.

If it was written with AI and the conversation history is available, I can ask my AI: 'why is this code here?', which would often save me a ton of time and headache when touching that code in the future.

By JeremyNT 2026-03-0213:04

I think this too. I use the initial spec from the issue tracker as the prompt and work from there.

The missteps the agent takes and the nudging I do along the way are ephemeral, and new models and tooling will behave differently.

If you have the original prompt and the diff you have everything you need.

By stackghost 2026-03-025:05

LLM session transcripts as part of the commit is a neat idea to consider, to be sure, but I know that I damn well don't want to read eight pages of "You're absolutely right! It's not a foo. It's a bar" slop (for each commit no less!) when I'm trying to find someone to git blame.

The solution is as it always has been: the commit message is where you convey to your fellow humans, succinctly and clearly, why you made the commit.

I like the idea of committing the initial transcript somewhere in the docs/ directory or something. I'll very likely start doing this in my side projects.

By matchagaucho 2026-03-025:421 reply

For me, it’s about preserving optionality.

If I can run resume {session_id} within 30 days of a file’s latest change, there’s a strong chance I’ll continue evolving that story thread—or at least I’ve removed the friction if I choose to.

By majormajor 2026-03-026:06

It seems unlikely that a file that hasn't changed in 30 days in an environment with a lot of "agents" cranking away on things is going to be particularly meaningful to revisit with the context from 30 days ago, vs using new context with everything that's been changed and learned since then.

By xlii 2026-03-0214:08

> Its much the same problem as asking, for example, if every single line you write, or every function, becomes a commit.

As a huge fan of atomic commits I'd say that smallest logical piece should be a commit. I never seen "intention-in-a-commit", i.e. multiple changes with overarching goal influence reviews. There's usually some kind of ticket that can be linked to the code itself if needed.

By D-Machine 2026-03-025:18

First N prompts is a good / practical heuristic for something worth storing (whether N = 1 or greater).

By notedbrew 2026-03-025:281 reply

You ignore the reality of vibe coding. If someone just prompts and never reads the code and tests the result barely, then the prompts can be a valuable insight.

But I am not rooting for either, just saying.

By refactor_master 2026-03-025:54

If A vibes, and B is overwhelmed with noise, how does B reliably go through it? If using AI, this necessarily faces the same problems that recording all A's actions was trying to solve in the first place, and we'd be stuck in a never-ending cycle.

We could also distribute the task to B, C, D, ... N actors, and assume that each of them would "cover" (i.e. understand) some part of A's output. But this suddenly becomes very labor intensive for other reasons, such as coordination and trust that all the reviewers cover adequately within the given time...

Or we could tell A that this is not a vibe playground and fire them.

By dang 2026-03-023:4320 reply

I floated that idea a week ago: https://news.ycombinator.com/item?id=47096202, although I used the word "prompts" which users pointed out was obsolete. "Session" seems better for now.

The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,

(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and

(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.

At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].

I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.

But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.

So, community: what should we do?

[1] this point came from seldrige at https://news.ycombinator.com/item?id=47096903 and https://news.ycombinator.com/item?id=47108653.

YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.

[2] Is Show HN dead? No, but it's drowning - https://news.ycombinator.com/item?id=47045804 - Feb 2026 (422 comments)

By amarant 2026-03-024:44

My current thinking is based on boris tanes[1] formalised method of coding with Claude code. I commit the research and plan.md files as they are when I finally tell Claude to implement changes in code. This becomes a living lexicon of the architecture and every feature added. A very slight variation I do from Boris's method is that I prefix all my research and plan .md filenames with the name of the feature. I can very quickly load relevant architecture into context by having Claude read a previous design document instead of analysing the whole code base. I'll take pieces I think are relevant and tell Claude to base research from those design documents.

[1] https://boristane.com/blog/how-i-use-claude-code/

By majormajor 2026-03-026:09

> But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.

IMO it's not the lack of context that makes them uninteresting. It's the fact that the bar for "this took effort and thought to make" has moved, so it's just a lot easier to make things that we would've considered interesting two years ago.

If you're asking HN readers to sift through additional commit history or "session transcripts" in order to decide if it's interesting, because there's a lot of noise, you've already failed. There's gonna be too much noise to make it worth that sifting. The elevator pitch is just gonna need to be that much different from "vibe coded thing X" in order for a project to be worth much.

By sillysaurusx 2026-03-024:053 reply

Unfortunately Codex doesn’t seem to be able to export the entire session as markdown, otherwise I’d suggest encouraging people to include that in their Show HNs. It’s kind of nuts that it’s so difficult to export what’s now a part of the engineering process.

I don’t have anything against vibe coded apps, but what makes them interesting is to see the vibe coding session and all the false starts along the way. You learn with them as they explore the problem space.

By dang 2026-03-024:101 reply

mthurman pointed me to https://static.simonwillison.net/static/2025/claude-code-mic... - is that what you have in mind?

By sillysaurusx 2026-03-024:331 reply

Yeah! That’s great. Having those alongside vibe coded apps would make them way more interesting.

By duggan 2026-03-0212:57

I've been tinkering away on one of these myself, https://rockstar.ninja. I expect there are a hundred others out there, going to be interesting to see what the end shape of these tools is.

By esperent 2026-03-024:08

I don't think it's hard to export, on the contrary its all already saved it your ~/.claude which so you could write up a tool to convert the data there to markdown.

By woctordho 2026-03-028:39

You can export it with DataClaw. By default it outputs jsonl and publishes to HuggingFace, but you can also do analysis locally with it.

By tempestn 2026-03-024:322 reply

Why does the regular voting system fail here? Are there just too many Show HNs for people to process the new ones, so the good ones get lost in the noise?

By dang 2026-03-024:56

Yes I believe that's it.

By th0ma5 2026-03-024:36

[dead]

By maxbond 2026-03-028:29

> So, community: what should we do?

My diagnosis is that the friction that existed before (the effort to create a project) was filtering out low-effort projects and keeping the amount of submissions within the capacity the community to handle. Now that the friction is greatly reduced, there's more low-effort content and it's beyond the community's capacity (which is the real problem).

So there's two options: increase the amount of friction or increase the capacity. I don't think the capacity options are very attractive. You could add tags/categories to create different niches/queues. The most popular tags would still be overwhelmed but the more niche ones would prosper. I wouldn't mind that but I think it goes against the site's philosophy so I doubt you'll be interested.

So what I would propose is to create a heavier submission process.

- Make it so you may only submit 1 Show HN per week.

- Put it into a review queue so that it isn't immediately visible to everyone.

- Users who are eligible to be reviewers (maybe their account is at least a year old with, maybe they've posted to Show HN at least once) can volunteer to provide feedback (as comments) and can approve of the submission.

- If it gets approved by N people, it gets posted.

- If the submitter can't get the approvals they need, they can review the feedback and submit again next week.

High effort projects should sail through. Projects that aren't sufficently effortful or don't follow the Show HN guidelines (eg it's account walled) get the opportunity to apply more polish and try again.

A note on requirements for reviewers: A lot of the best comments come from people with old accounts who almost never post and so may have less than 100 karma. My interpretation is that these people have a lot of experience but only comment when they have an especially meaningful contribution. So I would suggest having requirements for account age (to make it more difficult to approve yourself from a sockpuppet) but being very flexible with karma.

By grey-area 2026-03-027:33

1. Comments - Ban fully automated HN comments/accounts - can’t think of any reason to allow these or others to have to read them.

2. Require submissions which use GAI to have a text tag in title Show HN GAI would be fine for example - this would be a good first step and can be policed by readers mostly.

I do think point 1 is important to prevent fully automated voting rings etc.

Point 2 is preparation for some other treatment later - perhaps you could ask for a human written explanation on these ones?

I don’t think any complex or automated requirements are going to be enforceable or done so keep it simple. I also wonder whether show posts are enough - I’ve noticed a fair few blogspam posts using AI to write huge meandering articles.

By airstrike 2026-03-029:07

1. I think at a minimum we need a separate "Show HN" for AI posts, that people can filter out, so that users are not incentivized to spam Show HNs hoping to make it to the front page

2. Then that separate group, call it "Vibe HN", gets to decide what they find valuable through their own voting and flagging.

Some guidelines on what makes a good "Vibe HN" post would be helpful to nudge the community towards the things you're suggesting, but I think (1) cutting off self-promotion incentives given the low cost of creating software now and (2) allowing for self-moderation given the sheer number of submissions is the only tenable path

By wging 2026-03-024:471 reply

Regarding the noise you mention, I wonder if memento's use of the git 'notes' feature is an acceptable way to contain or quarantine that noise. It might still not add much value, but at least it would live in a separate place that is easily filtered out when the user judges it irrelevant. Per the README of the linked repo,

> It runs a commit and then stores a cleaned markdown conversation as a git note on the new commit.

So it doesn't seem that normal commit history is affected - git stores notes specially, outside of the commit (https://git-scm.com/docs/git-notes).

In fact github doesn't even display them, according to some (two-year-old) blog posts I'm seeing. Not sure about other interfaces to git (magit, other forges), but git log is definitely able to ignore them (https://git-scm.com/docs/git-log#Documentation/git-log.txt--...).

This doesn't mean the saved artifacts would necessarily be valuable - just that, unlike a more naive solution (saving in commit messages or in some directory of tracked files) they may not get in the way of ordinary workflows aside from maybe bloating the repo to some degree.

By mandel_x 2026-03-025:161 reply

You are 100% and that’s why I chose git notes. If you do not sync them you have no knowledge of their existence.

By trailblaze 2026-03-0213:06

[dead]

By Lerc 2026-03-0213:46

From my perspective, I have two projects that I have considered [Show HN] posts for. One of those I have not yet posted because I have not yet completed writing up the process I used to construct it (a non-trivial project in an artifact). Without that commentary it falls into a different class, which i agree shouldn't be outright excluded, but is of less general interest. The other project I think some people would be interested in it just for what it is in itself, I just want to add a bit more to it.

Perhaps [Show HN] for things that have commentary or highlight a particular thing. It's a bit nebulous because it gets to be like Wikipedia's notability and is more of a judgement call.

But if that is backed up with a [Creations], simply for things that have been made that people might like or because you are proud of your achievement.

So if you write a little Chess engine, it goes under [Creations]. If it is a Chess engine in 1k, or written in BrainFuck, or has a discussion on how you did it, it goes under [Show HN]

[Creations] would be much less likely to hit the front page of course, but I think there might need a nudge to push the culture towards recognising that being on the front page should not be the goal.

For reference here are the two things, coming to a [Show HN] near you (maybe).

https://fingswotidun.com/PerfBoard/ (Just an app, Commentary would be the value.)

https://lerc.neocities.org/ (this is just neat (to a certain mind anyway), awaiting some more polish)

By d--b 2026-03-0213:101 reply

The issue is that there is more HN submissions than the community is able to process. But you could say the same of the front page, which is mostly a fairly small sample of the good stuff that go through /new

So you could treat Show HN as the same. Like what gets floated on /show is only a small sample of the good stuff in /shownew and be fine with the idea that a lot of the good Show HN just slip through the crack. Which seems to me like the best alternative. Possibly with a /showpool maybe?

You could split Show HN into categories, but you'd have done it by now if you thought it a good idea.

You could also rate Show HN submissions algorithmically trying to push for those projects that have been around longer and that look like more effort has been put into them, but I guess that's kind of hard.

Or you'd have to hire actual people to pre-sort the submissions, and gut all the ones that are not up-to-par. In fact, if there was a human-based approval system for new Show HN, you'd possibly get a lot fewer submissions and more qualitative ones, which in itself would make the work of sorting through them simpler.

By adampunk 2026-03-0221:351 reply

Where is this deluge tho? In the last week how many have we seen hit the front page? A dozen? That Mathematica clone, the ZX spectrum emulator, the poorly named rtk, and…like 1-2 more are what I can remember from the last week that got popular.

That’s…pretty manageable.

By d--b 2026-03-0223:241 reply

They’re in /shownew. the last 30 Show HN submissions were sent in the last two hours. I think that’s a lot more than what we used to see.

By adampunk 2026-03-0314:46

OK, so I just stopped in there and I saw two projects that looked like they were generated by AI. The real flood seems to be scam blog posts.

I guess from reading about it, It would seem like this is a four-alarm fire. But I don’t see that when I go look around.

By bandrami 2026-03-023:58

Plenty of commits link to mailing list discussions about the proposed change, maybe something like that, with an archive of LLM sessions?

By esperent 2026-03-024:07

> the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't really add that much value.

This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.

By pjc50 2026-03-029:48

All the agentic AI projects remind me of "draw the rest of the owl": https://knowyourmeme.com/memes/how-to-draw-an-owl - there's a lot of steps missing.

Unlike many people, I'm on the trailing edge of this. Company is conservative about AI (still concerned about the three different aspects of IP risk) and we've found it not very good at embedded firmware. I'm also in the set of people who've been negatively polarized by the hype. I might be willing to give it another go, but what I don't see from the impressive Show HN projects (e.g. the WINE clone from last week) is .. how do you get those results?

By killingtime74 2026-03-024:21

Also the models change all the time and are not deterministic

By tptacek 2026-03-026:18

A starting point would be excluding Show HNs with generated READMEs, or that lack human-written explanations.

By mandel_x 2026-03-025:25

> people have been submitting so many Show HNs of generated projects

In this case, it was more of write the X language compiler using X. I had to prove to myself if keeping the session made sense, and what better way to do it than to vibe code the tool to audit vibe code.

I do get your point though

By grayhatter 2026-03-025:021 reply

> So, community: what should we do?

> Is Show HN dead? No, but it's drowning

Is spam on topic? and are AI codegen bots part of the community?

To me, the value of Show HN was rarely the thing, it was the work and attention that someone put into it. AI bot's don't do work. (What they do is worth it's own word, but it's not the same as work).

> I don't want to exclude these projects, because (1) some of them are good,

Most of them are barely passable at best, but I say that as a very biased person. But I'll reiterate my previous point. I'm willing to share my attention with people who've invested significant amounts of their own time. SIGNIFICANT amounts, of their time, not their tokens.

> (2) there's nothing wrong with more people being able to create and share things

This is true, only in isolation. Here, the topic is more, what to do about all this new noise, (not; should people share things they think are cool). If the noise drowns out the signal, you're allowed that noise to ruin something that was useful.

> (3) it's foolish to fight the future

coward!

I do hope you take that as the tongue-in-cheek way I meant it, because I say it as a friend would; but I refuse to resign myself completely to fatalism. Fighting the future is different from letting people doing something different ruin the good thing you currently have. Sure electric cars are the future, but that's no reason to welcome them in a group that loves rebuilding classic hot rods.

> (4) there's no obvious way to exclude them anyhow.

You got me there. But then, I just have to take your word for it, because it's not a problem I've spent a lot of time figuring out. But even then, I'd say it's a cultural problem. If people ahem, in a leadership position, comment ShowHN is reserved for projects that took a lot of time investment, and not just ideas with code... eventually the problem would solve itself, no? The inertia may take some time, but then this whole comment is about time...

I know it's not anymore, but to me, HN still somehow, feels a niche community. Given that, I'd like to encourage you to optimize for the people who want to invest time into getting good at something. A very small number of these projects could become those, but trying to optimize for best fairness to everyone, time spent be damned... I believe will turn the people who lift the quality of HN away.

By jgraham 2026-03-0212:571 reply

> it's foolish to fight the future

And yet, the premise of the question assumes that it's possible in this case.

Historically having produced a piece of software to accomplish some non-trivial task implied weeks, months, or more of developing expertise and painstakingly converting that expertise into a formulation of the problem precise enough to run on a computer.

One could reasonably assume that any reasonable-looking submission was in fact the result of someone putting in the time to refine their understanding of the problem, and express it in code. By discussing the project one could reasonably hope to learn more about their understanding of the problem domain, or about the choices they made when reifying that understanding into an artifact useful for computation.

Now that no longer appears to be the case.

Which isn't to say there's no longer any skill involved in producing well engineered software that continues to function over time. Or indeed that there aren't classes of software that require interesting novel approaches that AI tooling can't generate. But now anyone with an idea, some high level understanding of the domain, and a few hundred dollars a month to spend, can write out a plan can ask an AI provider to generate them software to implement that plan. That software may or may not be good, but determining that requires a significant investment of time.

That change fundamentally changes the dynamics of "Show HN" (and probably much else besides).

It's essentially the same problem that art forums had with AI-generated work. Except they have an advantage: people generally agree that there's some value to art being artisan; the skill and effort that went into producing it are — in most cases — part of the reason people enjoy consuming it. That makes it rather easy to at least develop a policy to exclude AI, even if it's hard to implement in practice.

But the most common position here is that the value of software is what it does. Whilst people might intellectually prefer 100 lines of elegant lisp to 10,000 lines of spaghetti PHP to solve a problem, the majority view here is that if the latter provides more economic value — e.g. as the basis of a successful business — then it's better.

So now the cost of verifying things for interestingness is higher than the cost of generating plausibly-interesting things, and you can't even have a blanket policy that tries to enforce a minimum level of effort on the submitter.

To engage with the original question: if one was serious about extracting the human understanding from the generated code, one would probably take a leaf from the standards world where the important artifact is a specification that allows multiple parties to generate unique, but functionally equivalent, implementations of an idea. In the LLM case, that would presumably be a plan detailed enough to reliably one-shot an implementation across several models.

However I can't see any incentive structure that might cause that to become a common practice.

By adampunk 2026-03-0221:371 reply

>a plan detailed enough to reliably one-shot an implementation across several models.

What. Why should this be an output? Why if I make a project should I be responsible for also making this, an entirely different and much more difficult and potentially impossible project? If I come and show you a project that require required thousands of sessions to make I also have to show you how to one shot it in multiple models? Does that even make sense?

By jgraham 2026-03-039:241 reply

To be clear: I don't think it will happen.

But the point of comparison is something like the HTML specification. That's supposed to be a document that is detailed enough about how to create an implementation that multiple different groups can produce compatible implementations without having any actual code in common.

In practice it still doesn't quite work: the specification has to be supplemented with testsuites that all implementations use, and even then there often needs to be a feedback loop where new implementations find new ambiguities or errors, and the specification needs to be updated. Plus implementors often "cheat" and examine each other's behaviour or even code, rather than just using the specification.

Nevertheless it's perhaps the closest thing I'm familiar with to an existing practice where the plan is considered canonical, and therefore worth thinking about as a model for what "code as implementation detail" would entail in other situations.

By adampunk 2026-03-0314:44

I think the looping part is what stops this from being a practical solution. If we imagine that the actual code required some iteration in order to put down, I don’t know that we could say there is a one shot equivalent without testing that. Sometimes there may not even be an equivalent.

It’s possible that the solution to code being implementation detail is to be less precious about it and not more. I don’t really have an answer here and I don’t think anyone does because it’s all very new and it is hard to manage.

There’s also a pretty normal way in which this is going to diverge and perhaps already has. Developers are building local bespoke skills just like they used to develop and still do local bespoke code to make their work more efficient. They may be able to do something that you or I cannot using the same models—-there’s no way to homologize their output. It would be like asking someone to commit their dot files alongside the project output. Regardless of whether or not it was the right thing to do no one would do it.

By acedTrex 2026-03-024:023 reply

> (2) there's nothing wrong with more people being able to create and share things

There is very clearly many things wrong with this when the things being shown require very little skill or effort.

By dang 2026-03-024:072 reply

That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.

By imiric 2026-03-026:39

Which users?

The future you're concerned with defending includes bots being a large part of this community, potentially the majority. Those bots will not only submit comments autonomously, but create these projects, and Show HN threads. I.e. there will be no human in the loop.

This is not something unique to this forum, but to the internet at large. We're drowning in bot-generated content, and now it is fully automated.

So the fundamental question is: do you want to treat bots as human users?

Ignoring the existential issue, whatever answer you choose, it will inevitably alienate a portion of existing (human) users. It's silly I have to say this, but bots don't think, nor "care", and will keep coming regardless.

To me the obvious answer is "no". All web sites that wish to preserve their humanity will have to do a complete block of machine-generated content, or, at the very least, filter and categorize it correctly so that humans who wish to ignore it, can. It's a tough nut to crack, but I reckon YC would know some people capable of tackling this.

It's important to note that this state of a human driving the machine directly is only temporary. The people who think these are tools as any other are sorely mistaken. This tool can do their minimal effort job much more efficiently, cheaper, and with better results, and it's only a matter of time until the human is completely displaced. This will take longer for more complex work, of course, but creating regurgitated projects on GitHub and posting content on discussion forums is a very low bar activity.

By lelanthran 2026-03-0211:45

> That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.

Is it really that difficult to identify bot accounts right now? Or people who create a HN account only to post their project?

That seems like low-hanging fruit that should be picked immediately.

By newswasboring 2026-03-0212:22

Why exactly is the skill level required for something a gating parameter?

By CuriouslyC 2026-03-024:52

Taking a good picture requires very little effort once you've found yourself in the right place. You gonna shit on Ansel Adams?