Compiler Explorer and the promise of URLs that last forever

2025-05-2816:28357189xania.org

Show article

The history is this: back in the old days (2012), we used to store the entire Compiler Explorer state in the URL. That got unwieldy (who would have thought encoding an entire compiler state in a URL might get a bit long?), so we added support for Google’s link shortener goo.gl in March 2014. That meant short links were of the form goo.gl/abc123. Clicking a goo.gl link would eventually redirect you to the full URL link on our site, and we’d decode the state from the URL.

In 2016, Stack Overflow banned link shorteners because of how they cloak the actual destination of links. Abusers could post innocent goo.gl links that directed folks unwittingly to bad content. However, that meant our Compiler Explorer links were also affected. At the time, we had no intention of storing any user data, so we came up with a hack: we still used goo.gl, but we then rewrote the link we handed out to be godbolt.org/g/abc123 (where the abc123 is the goo.gl unique ID). We then redirected any hits to /g/abc123 to goo.gl/abc123, which then (finally) redirected back to godbolt.org with the appropriate state in the URL. If you’re keeping track, that’s three redirects to show you some assembly code. We were really committed to making things complicated. Later, we used Google’s API to avoid the redirection dance.

By 2018, the limitations of storing state in the URL started to bite. There’s a limit to how long a URL can be (and we’d already started compressing the data in the URL), so we needed a better solution. We finally implemented our own storage solution: we hash the input, save the state as a JSON document on S3 under the hash, and then give out a shortened form of the hash as a godbolt.org/z/hashbit URL. We use DynamoDB to store the mapping of shortened hashes to the full paths (accounting for partial collisions, etc.). And, amusingly, we check the short link’s hash for rude words and add deliberate extra information into the document until we no longer get a rude word. Yes, we literally check if your shortened URL contains profanity. Because apparently even random hashes can’t be trusted to keep it clean. This led to bug #1297, which remains one of my favourite issues we’ve ever had to fix.

We still support the godbolt.org/g/abc123 links, but… despite Google solemnly promising that “all existing links will continue to redirect to the intended destination,” it went read-only a few years back, and now they’re finally sunsetting it in August 2025. Here I was in 2014, thinking I was so clever using Google’s shortener. “It’ll be around forever!” I said. “Google never discontinues products!” I said. Er…

That means we’ll no longer be able to resolve goo.gl-based links! Which is, to use technical terminology, a bit pants. One of my founding principles is that Compiler Explorer links should last forever. I can’t do anything about the really legacy actual goo.gl links, but I can do something about the godbolt.org/g/abc123 links!

Over the last few days, I’ve been scraping everywhere I can think of, collating the links I can find out in the wild, and compiling my own database of links¹ – and importantly, the URLs they redirect to. So far, I’ve found 12,000 links from scraping:

Google (using their web search API)
GitHub (using their API)
Our own (somewhat limited) web logs
The archive.org Stack Overflow data dumps
Archive.org’s own list of archived webpages

12,298 rescued links and counting - not bad for a few days of digital archaeology

We’re now using the database in preference to goo.gl internally, so I’m also keeping an eye on new “g” links that we don’t yet have.

Thanks to Peter Cordes for reminding us about this issue and bringing it to our attention².

If you have a secret cache of godbolt.org/g/abc123 links you have lying around, now’s the time to visit each of them! That will ensure they’re in my web logs and I’ll add them to the database. Otherwise, sadly, in August 2025 those links will stop working, joining the great digital graveyard alongside Flash games and GeoCities pages.

The Bigger Picture

This whole saga reinforces why I’m skeptical of relying on third-party services for critical infrastructure. Google’s URL shortener was supposed to be permanent. The redirect chains we built were clever workarounds that bought us time, but ultimately, the only way to truly keep a promise of “URLs that last forever” is to own the entire stack.

It’s been a fascinating archaeological dig through the internet, hunting down these legacy links like some sort of digital Indiana Jones, except instead of ancient artifacts I’m rescuing compiler flags and optimization examples. Each one represents someone’s attempt to share knowledge, ask a question, or demonstrate a concept. Preserving them feels like preserving a small piece of programming history.

So if you’ve got old Compiler Explorer links bookmarked somewhere, dust them off and give them a click. You’ll be helping preserve a little corner of the internet’s shared knowledge – and keeping a promise I made back in 2012. And hey, at least this time I’m in control of the infrastructure. What could possibly go wrong?

Disclaimer

This article was written by a human, but links were suggested by and grammar checked by an LLM.

Read the original article

Comments

By kccqzy 2025-05-2817:1012 reply

Before 2010 I had this unquestioned assumption that links are supposed to last forever. I used the bookmark feature of my browser extensively. Some time afterwards, I discovered that a large fraction of my bookmarks were essentially unusable due to linkrot. My modus operandi after that was to print the webpage as a PDF. A bit afterwards when reader views became popular reliable, I just copy-pasted the content from the reader view into an RTF file.

By lappa 2025-05-2819:197 reply

I use the SingleFile extension to archive every page I visit.

It's easy to set up, but be warned, it takes up a lot of disk space.

    $ du -h ~/archive/webpages
    1.1T /home/andrew/archive/webpages

https://github.com/gildas-lormeau/SingleFile

By internetter 2025-05-2821:342 reply

storage is cheap, but if you wanted to improve this:

1. find a way to dedup media

2. ensure content blockers are doing well

3. for news articles, put it through readability and store the markdown instead. if you wanted to be really fancy, instead you could attempt to programatically create a "template" of sites you've visited with multiple endpoints so the style is retained but you're not storing the content. alternatively a good compression algo could do this, if you had your directory like /home/andrew/archive/boehs.org.tar.gz and inside of the tar all the boehs.org pages you visited are saved

4. add fts and embeddings over the pages

By ashirviskas 2025-05-2822:341 reply

1 and partly 3 - I use btrfs with compression and deduping for games and other stuff. Works really well and is "invisible" to you.

By bombela 2025-05-293:29

dedup on btrfs requires to setup a cronjob. And you need to pick one of the dedup too. It's not completely invisible in my mind bwcause of this ;)

By windward 2025-05-298:381 reply

>storage is cheap

It is. 1.1TB is both:

- objectively an incredibly huge amount of information

- something that can be stored for the cost of less than a day of this industry's work

Half my reluctance to store big files is just an irrational fear of the effort of managing it.

By IanCal 2025-05-299:28

> - something that can be stored for the cost of less than a day of this industry's work

Far, far less even. You can grab a 1TB external SSD from a good name for less than a days work at minimum wage in the UK.

I keep getting surprised at just how cheap large storage is every time I need to update stuff.

By davidcollantes 2025-05-2819:541 reply

How do you manage those? Do you have a way to search them, or a specific way to catalogue them, which will make it easy to find exactly what you need from them?

By nirav72 2025-05-292:361 reply

KaraKeep is a decent self hostable app that has support for receiving singlefile pages via singlefile browser extension and pointing to karakeep API. This allows me to search for archived pages. (Plus auto summarization and tagging via LLM).

By dotancohen 2025-05-295:451 reply

Very naive question, surely. What does KaraKeep provide that grep doesn't?

By nirav72 2025-05-2922:261 reply

jokes aside. It has a mobile app

By dotancohen 2025-05-304:021 reply

I don't get it aside. How does that help him search files on his local file system? Or is he syncing an index of his entire web history to his mobile device?

By nirav72 2025-05-3022:001 reply

GP is using SingleFile browser extension. Which allows him to download the entire page as a single .html file. But SingleFile also allows sending that page to Karakeep directly instead of downloading it to his local file system. (if he's hosting karakeep on a NAS on his network). He can then use the mobile app or Karakeep web UI to search and view that archived page. Karakeep does the indexing. (Including auto-tagging via LLM)

By dotancohen 2025-05-3119:44

I see now, thank you.

By snthpy 2025-05-295:202 reply

Thanks. I didn't know about this and it looks great.

A couple of questions:

- do you store them compressed or plain?

- what about private info like bank accounts or health issuance?

I guess for privacy one could train oneself to use private browsing mode.

Regarding compression, for thousands of files don't all those self-extraction headers add up? Wouldn't there be space savings by having a global compression dictionary and only storing the encoded data?

By d4mi3n 2025-05-2914:03

> do you store them compressed or plain?

Can’t speak to your other issues but I would think the right file system will save you here. Hopefully someone with more insight can provide color here, but my understanding is that file systems like ZFS were specifically built for use cases like this where you have a large set of data you want to store in a space efficient manner. Rather than a compression dictionary, I believe tech like ZFS simply looks at bytes on disk and compresses those.

By genewitch 2025-05-297:18

By default, singlefile only saves when you tell it to, so there's no worry about leaking personal information.

I haven't put the effort in to make a "bookmark server" that will accomplish what singlefile does but on the internet because of how well singlefile works.

By shwouchk 2025-05-2823:341 reply

i was considering a similar setup, but i don’t really trust extensions. Im curious;

- Do you also archive logged in pages, infinite scrollers, banking sites, fb etc? - How many entries is that? - How often do you go back to the archive? is stuff easy to find? - do you have any organization or additional process (eg bookmarks)?

did you try integrating it with llms/rag etc yet?

By eddd-ddde 2025-05-2913:121 reply

You can just fork it, audit the code, add your own changes, and self host / publish.

By shwouchk 2025-06-037:30

yes, you right. im not helpless and all the new ai tools make this even easier.

By nyarlathotep_ 2025-05-2914:13

Are you automating this in some fashion? Is there another extension you've authored or similar to invoke SingleFile functionality on a new page load or similar?

By dataflow 2025-05-2914:391 reply

Have you tried MHTML?

By RiverCrochet 2025-05-2915:281 reply

SingleFile is way more convenient as it saves to a standard HTML file. The only thing I know that easily reads MHTML/.mht files is Internet Explorer.

By dataflow 2025-05-2915:311 reply

Chrome and Edge read them just fine? The format is actually the same as .eml AFAIK.

By RiverCrochet 2025-05-2918:311 reply

I remember having issues but it could be because the .mht's I had were so old I think I used Internet Explorer's Save As... function to generate them.

By dataflow 2025-05-2921:28

I've had such issues with them in the past too, yeah. I never figured out the root cause. But in recent times I haven't had issues, for whatever that's worth. (I also haven't really tried to open many of the old files either.)

By 90s_dev 2025-05-2820:14

You must have several TB of the internet on disk by now...

By flexagoon 2025-05-2817:163 reply

By the way, if you install the official Web Archive browser extension, you can configure it to automatically archive every page you visit

By petethomas 2025-05-2817:551 reply

This a good suggestion with the caveat that entire domains can and do disappear: https://help.archive.org/help/how-do-i-request-to-remove-som...

By Akronymus 2025-05-2910:45

That's especially annoying when a formerly useful site gets abandoned, a new owner picks up the domain, then gets IA to delete the old archives as well.

Or even worse, when a domain parking company does that: https://archive.org/post/423432/domainsponsorcom-erasing-pri...

By internetter 2025-05-2821:352 reply

recently I've come to believe even IA and especially archive.is are ephermal. I've watched sites I've saved disappear without a trace, except in my selfhosted archives.

A technological conundrum, however, is the fact that I have no way to prove that my archive is an accurate representation of a site at a point in time. Hmmm, or maybe I do? Maybe something funky with cert chains could be done.

By akoboldfrying 2025-05-2822:381 reply

There are timestamping services out there, some of which may be free. It should (I think) be possible to basically submit the target site's URL to the timestamping service, and get back a certificate saying "I, Timestamps-R-US, assert that the contents of https://targetsite.com/foo/bar downloaded at 12:34pm on 29/5/2025 hashes to abc12345 with SHA-1", signed with their private key and verifiable (by anyone) with their public key. Then you download the same URL, and check that the hashes match.

IIUC the timestamping service needs to independently download the contents itself in order to hash it, so if you need to be logged in to see the content there might be complications, and if there's a lot of content they'll probably want to charge you.

By XorNot 2025-05-2822:452 reply

Websites don't really produce consistent content even from identical requests though.

But you also don't need to do this: all you need is a service which will attest that it saw a particular hashsum at a particular time. It's up to other mechanisms to prove what that means.

By account42 2025-06-028:16

> But you also don't need to do this: all you need is a service which will attest that it saw a particular hashsum at a particular time. It's up to other mechanisms to prove what that means.

"That URL served a particular hash at a particular time" or "someone submitted a particular hash at a particular time" provide very different guarantees and the latter will be insufficient to prove your archive is correct.

By akoboldfrying 2025-05-2823:25

> Websites don't really produce consistent content even from identical requests though.

Often true in practice unfortunately, but to the extent that it is true, any approach that tries to use hashes to prove things to a third party is sunk. (We could imagine a timestamping service that allows some kind of post-download "normalisation" step to strip out content that varies between queries and then hash the results of that, but that doesn't seem practical to offer as a free service.)

> all you need is a service which will attest that it saw a particular hashsum at a particular time

Isn't that what I'm proposing?

By shwouchk 2025-05-2823:351 reply

sign it with gpg and upload the sig to bitcoin

edit: sorry, that would only prove when it was taken, not that it wasn’t fabricated.

By fragmede 2025-05-2823:391 reply

hash the contents

By shwouchk 2025-05-293:321 reply

signing it is effectively the same thing. question is how to prove that what you hashed is what was there?

By chii 2025-05-296:032 reply

you can't, because unless you're not the only one with a copy, your hash cannot be verified (since both hash and claim comes from you).

One way to make this work is to have a mechanism like bitcoin (proof of work), where the proof of work is put into the webpage itself as a hash (made by the original author of that page). Then anyone can verify that the contents wasn't changed, and if someone wants to make changes to it and claim otherwise, they'd have to put in even more proof of work to do it (so not impossible, but costly).

By notpushkin 2025-05-2913:311 reply

I think there was a way to preserve TLS handshake information in a way that something something you can verify you got the exact response from the particular server? I can’t look it up now though, but I think there was a Firefox add-on, even.

By account42 2025-06-028:18

I don't think how this can work. While the handshake uses asymmetric crypto, that step then gives you a symmetric key that will be used for the actual content. You need that key to decrypt the content but if you have it you can also use it to encrypt your own content and substitute it in the encrypted stream.

By fragmede 2025-05-2915:43

what if instead of the proof of work being in the page as a hash, that the distributed proof of work is that some subset of nodes download a particular bit of html or json from a particular URI, and then each node hashes that, saves the contents and the hash to a blockchain-esque distributed database. Subject to 51% attack, same as any other chain, but still.

By vitorsr 2025-05-2817:55

> you can configure it to automatically archive every page you visit

What?? I am a heavy user of the Internet Archive services, not just the Wayback Machine, including official and "unofficial" clients and endpoints, and I had absolutely no idea the extension could do this.

To bulk archive I would manually do it via the web interface or batch automate it. The limitations of manually doing it one by one are obvious, and the limitations of doing it in batches requires, well, keeping batches (lists).

By 90s_dev 2025-05-2818:052 reply

My solution has been to just remember the important stuff, or at least where to find it. I'm not dead yet so I guess it works.

By TeMPOraL 2025-05-2820:581 reply

It was my solution too, and I liked it, but over the past decade or so, I noticed that even when I remember where to find some stuff, hell, even if I just remember how to find it, when I actually try and find it, it often isn't there anymore. "Search rot" is just as big a problem as link rot.

As for being still alive, by that measure hardly anything anyone does is important in the modern world. It's pretty hard to fail at thinking or remembering so badly that it becomes a life-or-death thing.

By 90s_dev 2025-05-2823:31

> hardly anything anyone does is important

Agreed.

By mock-possum 2025-05-295:072 reply

I’ve found that whenever I think “why don’t other people just do X” it’s because I’m misunderstanding what’s involved in X for them, and that generally if they could ‘just’ do X then they would.

“Why don’t you just” is a red flag now for me.

By 90s_dev 2025-05-2913:55

Not always. I love it when people offer me a much simpler solution to a problem I overengineered, so I can throw away my solution and use the simpler one.

Half the time people are suggested a better way, it's because they're actually doing it wrong, they've gotten the solution's requirements all wrong in the first place, and this perspective helps.

By chii 2025-05-296:06

this applies to basically any suggested solution to any problem.

"Why don't you just ..." is just lazy idea suggestion from armchair internet warriors.

By mycall 2025-05-290:231 reply

Is there some browser extension that automatically goes to web.archive.org if the link timesout?

By theblazehen 2025-05-298:47

I use the Resurrect Pages addon

By account42 2025-06-028:07

I really is a travesty that Browsers still haven't updated their bookmark feature based on this realization - all bookmarks should store not only the link but a full copy of the rendered page (not just the source which could rely on dynamic content that will no longer be available).

Also, open tabs should work the same way: I never want to see a network error while going back to a tab while not having an internet connection because the browser has helpfully evicted that tab from memory. It should just reload the state from disk instead of the network in this case until I manually refresh the page.

By macawfish 2025-05-2820:451 reply

Use WARC: https://en.wikipedia.org/wiki/WARC_(file_format) with WebRecorder: https://webrecorder.net/

By shwouchk 2025-05-290:031 reply

warc is not a panacea; for example, gemini makes it super annoying to get a transcript of your conversation, so i started saving those as pdf and warc.

turns out that unlike most webpages, the pdf version is only a single page of what is visible on screen.

turns out also that opening the warc immediately triggers a js redirect that is planted in the page. i can still extract the text manually - it’s embedded there - but i cannot “just open” the warc in my browser and expect an offline “archive” version - im interacting with a live webpage! this sucks from all sides - usability, privacy, security.

Admittedly, i don’t use webrecorder - does it solve this problem? did you verify?

By weinzierl 2025-05-297:171 reply

Not sure if you tried that. Chrome has a take full page screenshot command. Just open the command bar in dev tools and search for "full" and you will fund it. Firefox has it right in the context menu, no need for dev tools.

Unfortunately there are sites where it does not work.

By eMPee584 2025-05-299:39

Apart from small UX nits, FF's screenshot feature is great - it's just that storing a 2-15MiB bitmap copy of a text medium still feels dirty to me every time.. would much prefer a PDF export, page size matching the scroll port, with embedded fonts and vectors and without print CSS..

By andai 2025-05-2821:151 reply

Is there some kind of thing that turns a web page into a text file? I know you can do it with beautiful soup (or like 4 lines of python stdlib), but I usually need it on my phone, where I don't know a good option.

My phone browser has a "reader view" popup but it only appears sometimes, and usually not on pages that need it!

Edit: Just installed w3m in Termux... the things we can do nowadays!

By XorNot 2025-05-2821:171 reply

You want Zotero.

It's for bibliographies, but it also archives and stores web pages locally with a browser integration.

By _huayra_ 2025-05-293:37

I frankly don't know how I'd collect any useful info without it.

I'm sure there are bookmark services that also allow notes, but the tagging, linking related things, etc, all in the app is awesome, plus the ability to export bib tex for writing a paper!

By m-p-3 2025-06-0214:23

I export text-based content I want to retain into Markdown files, and when I find something useful for work I also send the URL to the Wayback Machine.

By nonethewiser 2025-05-2818:40

A reference is a bet on continuity.

At a fundamental level, broken website links and dangling pointers in C are the same.

By jwe 2025-05-309:12

I can recommend to use Pinboard with the archive option

By taeric 2025-05-2822:39

That assumption isn't true of any sources? Things flat out change. Some literally, others more in meaning. Some because they are corrected, but there are other reasons.

Not that I don't think there is some benefit in what you are attempting, of course. A similar thing I still wish I could do is to "archive" someone's phone number from my contact list. Be it a number that used to be ours, or family/friends that have passed.

By rubit_xxx16 2025-05-2822:56

> Before 2010 I had this unquestioned assumption that links are supposed to last forever

Any site/company whatsoever of this world (and most) that promises that anything will last forever is seriously deluded or intentionally lying, unless their theory of time is different than that of the majority.

By mananaysiempre 2025-05-2816:543 reply

May be worth cooperating with ArchiveTeam’s project[1] on Goo.gl?

> url shortening was a fucking awful idea[2]

[1] https://wiki.archiveteam.org/index.php/Goo.gl

[2] https://wiki.archiveteam.org/index.php/URLTeam

By MallocVoidstar 2025-05-2820:23

IIRC ArchiveTeam were bruteforcing Goo.gl short URLs, not going through 'known' links, so I'd assume they have many/all of Compiler Explorer's URLs. (So, good idea to contact them)

By tech234a 2025-05-291:45

Real-time status for that project indicates 7.5 billion goo.gl URLs found out of 42 billion goo.gl URLs scanned: https://tracker.archiveteam.org:1338/status

By mattgodbolt 2025-05-303:53

Thanks! Someone posted on GitHub about that and I'll be looking at that tomorrow!

By s17n 2025-05-2817:584 reply

URLs lasting forever was a beautiful dream but in reality, it seems that 99% of URLs don't in fact last forever. Rather than endlessly fighting a losing battle, maybe we should build the technology around the assumption that infrastructure isn't permanent?

By nonethewiser 2025-05-2818:35

>maybe we should build the technology around the assumption that infrastructure isn't permanent?

Yes. Also not using a url shortener as infrastructure.

By dreamcompiler 2025-05-295:41

URNs were supposed to solve that problem by separating the identity of the thing from the location of the thing.

But they never became popular and then link shorteners reimplemented the idea, badly.

https://en.m.wikipedia.org/wiki/Uniform_Resource_Name

By hoppp 2025-05-2820:391 reply

Yes.

domain names often exchange hands and a URL that is supposed to last forever can turn into malicious phishing link over time.

By emaro 2025-05-2821:472 reply

In theory a content-addressed system like IPFS would be the best: if someone online still has a copy, you can get it too.

By mananaysiempre 2025-05-292:191 reply

It feels as though, much like cryptography in general reduces almost all confidentiality-adjacent problems to key distribution (which is damn near unsolvable in large uncoordinated deployments like Web PKI or PGP), content-addressable storage reduces almost all data-persistence-adjacent problems to maintenance of mutable name-to-hash mappings (which is damn near unsolvable in large uncoordinated deployments like BitTorrent, Git, or IP[FN]S).

By dreamcompiler 2025-05-295:461 reply

DNS seems to solve the problem of a decentralized loosely-coordinated mapping service pretty well.

By emaro 2025-05-299:171 reply

True, but then you're back on square one. Because it's not guaranteed that using a (DNS) name will point to the same content forever.

By hoppp 2025-05-2913:292 reply

But then all content should be static and never update?

If you serve an SPA via IPFS, the SPA still needs to fetch the data from an endpoint which could go down or change

Even if you put everything on a blockchain, an RPC endpoint to read the data must have a URL

By mananaysiempre 2025-05-2915:02

> But then all content should be static and never update?

And thus we arrive at the root of the conflict. Many users (that care about this kind of thing) want to publications that they’ve seen to stay where they’ve seen them; many publishers have become accustomed to being able to memory-hole things (sometimes for very real safety reasons; often for marketing ones). That on top of all the usual problems of maintaining a space of human-readable names.

By emaro 2025-05-3110:26

No, not all content should never change. This is just the core of the dilemma: dynamic content (and identifiers) rots faster that static content (content addressed). We can have both, but not at the same time.

By immibis 2025-05-299:061 reply

Note that IPFS is now on the EU Piracy Watchlist which may be a precursor to making it illegal.

By emaro 2025-05-3110:27

Didn't know that, interesting. Although maybe it's not that surprising...

By jjmarr 2025-05-2823:56

URL identify the location of a resource on a network, not the resource itself, and so are not required to be permanent or unique. That's why they're called "uniform resource locators".

This problem was recognized in 1997 and is why the Digital Object Identifier was invented.