
← Current Status Powered by Atlassian Statuspage
If anyone needs commands for turning off the CF proxy for their domains and happens to have a Cloudflare API token.
First you can grab the zone ID via:
curl -X GET "https://api.cloudflare.com/client/v4/zones" -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" | jq -r '.result[] | "\(.id) \(.name)"'
And a list of DNS records using: curl -X GET "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records" -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json"
Each DNS record will have an ID associated. Finally patch the relevant records: curl -X PATCH "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/dns_records/$RECORD_ID" -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" --data '{"proxied":false}'
Copying from a sibling comment - some warnings:- SSL/TLS: You will likely lose your Cloudflare-provided SSL certificate. Your site will only work if your origin server has its own valid certificate.
- Security & Performance: You will lose the performance benefits (caching, minification, global edge network) and security protections (DDoS mitigation, WAF) that Cloudflare provides.
- This will also reveal your backend internal IP addresses. Anyone can find permanent logs of public IP addresses used by even obscure domain names, so potential adversaries don't necessarily have to be paying attention at the exact right time to find it.
Also, for anyone who only has an old global API key lying around instead of the more recent tokens, you can set:
-H "X-Auth-Email: $EMAIL_ADDRESS" -H "X-Auth-Key: $API_KEY"
instead of the Bearer token header.Edit: and in case you're like me and thought it would be clever to block all non-Cloudflare traffic hitting your origin... remember to disable that.
This is exactly what we've decided we should do next time. Unfortunately we didn't generate an API token so we are sitting twiddling our thumbs.
Edit: seems like we are back online!
Took me ~30 minutes but eventually I was able to log in, get past the 2FA screen and change a DNS record.
I surely missed a valid API token today.
Im able to generate keys right now through warp. Login takes forever but it is working.
Awesome! I did it via the Terraform provider, but for anyone else without access to the dashboard this is great. Thank you!
If anyone needs the internet to work again (or to get into your cf dashboard to generate API keys), if you have Cloudflare WARP installed, turning it on appears to fix otherwise broken sites. Maybe using 1.1.1.1 does too, but flipping the radio box was faster. Some parts of sites are still down, even after tunneling into to CF.
super helpful. thanks!
looks like i can get everywhere i couldn't except my cloudflare dash.
Its absurdly slow (like multiple minutes to get the login page to fully load for the login button to be pressable, due to catchpa...), but I was able to log into the dashboard. It's throwing lots of errors once inside, but I can navigate around some of it. YMMV.
My profile (including api tokens,) and websites pages all work, the accounts tab above website on the left does not.
Good advice!
And no need for -X GET to make a GET request with curl, it is the default HTTP method if you don’t send any content.
If you do send content with say -d curl will do a POST request, so no need for -X then either.
For PATCH though, it is the right curl option.
thanks for this! just expanded on a bit and published a write up here so it's easier to find in the future: https://www.coryzue.com/writing/cloudflare-dns/
I would advise against this action. Just ride the crash.
If people knew how to play the 5 hour long game they wouldn't have been using Cloudflare in the first place.
[dead]
A colleague of mine just came bursting through my office door in a panic, thinking he brought our site down since this happened just as he made some changes to our Cloudflare config. He was pretty relieved to see this post.
Tell him it's worse than he thinks. He obviously brought the entire Cloudflare system down.
You joke and I think its funny, but as a junior engineer I would be quite proud if some small change I made was able to take down the mighty Cloudflare.
If I were Cloudflare it would mean an immediate job offer well above market. That junior engineer is either a genius or so lucky that they must be bred by Pierson’s Puppeteers or such a perfect manifestation of a human fuzzer that their skills must be utilized.
This reminds of a friend I had in college. We were assigned to the same group coding an advanced calculator in C. This guy didn't know anything about programming (he was mostly focused on his side biz of selling collector sneakers), so we assigned him to do all the testing, his job was to come up with weird equations and weird but valid way to present them to the calculator. And this dude somehow managed to crash almost all of our iterations except the few last ones. Really put the joke about a programmer, a tester, and a customer walk into a bar into perspective.
I love that he ended up making a very valuable contribution despite not knowing how to program -- other groups would have just been mad at him, had him do nothing, or had him do programming and gotten mad when it was crap or not finished.
I never thought I'd get the chance, but then my Claude Code on Web credits ran out and I had to find another way to entertain myself.
I think the rate limits for Claude Code on the Web include VM time in general and not just LLM tokens. I have a desktop app with a full end to end testing suite which the agent would run for every session that probably burned up quite a bit.
Internet points demand obscure references these days. My system prompt has its own area code
> If I were Cloudflare it would mean an immediate job offer well above market.
And not a lawsuit? Cause I've read more about that kind of reaction than of job offers. Though I guess lawsuits are more likely to be controversial and talked about.
I kind of did that back in the days when they released Worker KV, I tried to bulk upload a lot of data and it brought the whole service down, can confirm I was proud :D
It's also not exactly the least common way that this sort of huge multi-tenant service goes down. It's only as rare as it is because more or less all of them have had such outages in the past and built generic defenses (e.g. automated testing of customer changes, gradual rollout, automatic rollback, there are others but those are the ones that don't require any further explanation).
You might want to consider migrating to Azure Front Door if that's a feature you like: https://www.infoq.com/news/2025/11/azure-afd-control-plane-f...
Well its easy to cause damage by messing up the `rm` command, esp with `-fr` options. So don't take it as a proxy for some great skill which is required to cause damage.
You could easily cause great damage to your Cloudflare setup, but CF has measures to prevent random customers deleting stuff from taking down the entire service globally. Unless you have admin access to the entire CF system, you can't really cause much damage with rm.
>You joke and I think its funny, but as a junior engineer I would be quite proud if some small change I made was able to take down the mighty Cloudflare.
I mean, with Cloudflare's recent (lack of) uptime, I would argue there's a degree of crashflation happening such that the prestige is less in doing so. I mean nowadays if a lawnmower drives by cloudflare and backfires that's enough to collapse the whole damn thing
Are you actually so mind-numbingly ignorant that you think Rebecca Heineman had a brother named Bill, that you would rudely and incorrectly try to correct people who knew her story well, during a memorial discussion of her life and death?
Or were you purposefully going out of your way to perpetrate performative ignorance and transphobic bullying, just to let everyone know that you're a bigoted transphobic asshole?
I don't buy that it was an innocent mistake, given the context of the rest of the discussion, and your pretending to know her family better than the poster you were replying to and everyone else in the discussion, falsely denying her credit for her own work. Do you really think dang made the Hacker News header black because he and everyone else was confused and you were right?
Do you like to show up at funerals of people you don't know, just to interrupt the eulogy with insults, stuff pennies up your ass (as you claim to do), then shit and piss all over the coffin in front of their family and friends?
How long did you have to wait until she died before you had the courage to deadname, misgender, and punch down at her in a memorial, out of hate and cowardice and a perverse desire to show everyone what kind of a person you really are?
Next time, can you at least wait until after the funeral before committing your public abuse?
https://news.ycombinator.com/item?id=45975524
amypetrik8 13 hours ago [flagged] [dead] | parent | context | flag | vouch | favorite | on: Rebecca Heineman has died
The work you're outlining here is was performed by "Bill Heineman" - maybe you are mixing up Bill with his sister Rebecca?!?
Can you calm down with the absolutely mental rants against people in unrelated threads. Cuckoo crazy behavior.
Posting abusive bigoted bullshit in a memorial thread is cuckoo crazy behavior. Calling it out and describing it isn't. You're confusing describing the abuse with committing the abuse. Direct your scorn at the person I'm criticizing, unless you agree with what they did, in which case my criticism also applies directly and personally to you, so no wonder you created a throw away sock puppet account just to attempt to defend your own bigotry and abuse.
Have you ranted at John Carmack yet?
Well, you can never be sure that he didn't:
It's also what was the cause of the Azure Front Doors global outage two weeks ago - https://aka.ms/air/YKYN-BWZ
"A specific sequence of customer configuration changes, performed across two different control plane build versions, resulted in incompatible customer configuration metadata being generated. These customer configuration changes themselves were valid and non-malicious – however they produced metadata that, when deployed to edge site servers, exposed a latent bug in the data plane. This incompatibility triggered a crash during asynchronous processing within the data plane service. This defect escaped detection due to a gap in our pre-production validation, since not all features are validated across different control plane build versions."
This is actually pretty nice and amazing that they publish video format incident retrospectives.
Oh don't you worry. We are very much talking about the global outage as if he was the root cause. Like good colleagues :)
Hmm, wait a minute.. maybe he was the cause! (no, kidding. just upping the pressure as a good peer :)
are we truly good if we don't start a class action suit against this hapless scapegoat?!
Just join the one we've started over in this cubicle!
> May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances.
I'd love to know more about what those specific circumstances were!
I'm pretty sure I crashed Gmail using something weird in its filters. It was a few years ago. Every time I did something specific (I don't remember what), it would freeze and then display a 502 error for a while.
Damn, imagine being the customer responsible for that, oof
What do you imagine would be the result if you brought down cloudflare with a legitimate config update (ie not specifically crafted to trigger known bugs) while not even working for them? If I were the customer "responsible" for this outage, I'd just be annoyed that their software is apparently so fragile.
I would be fine if it was my "fault", but I'm sure people in business would find a way to make me suffer.
But on a personal level, this is like ordering something at a restaurant and the cook burning the kitchen because they forgot to take out your pizza out of the oven or something.
I would be telling it to everyone over beers (but not my boss).
I would be tempted to put it on my CV :D
Is there a word for that feeling of relief when someone else fucked up after initially thinking it was you?
What’s funny is as I get older this feeling of relief turns more like a feeling of dread. The nice thing about problems that you cause is that you have considerable autonomy to fix them. Cloudflare goes down you’re sitting and waiting for a 3 party to fix something.
Why is it dread? I always feel good when big players mess up, as it makes me feel better about my own mess ups in life previously.
Can’t speak for GP but ultimately I’d rather it be my fault or my company’s fault so I have something I can directly do for my customers who can’t use our software. The sense of dread isn’t about failure but feeling empathy for others who might not make payroll on time or whatever because my service that they rely on is down. And the second order effects, like some employee of a customer being unable to make rent or be forced to take out a short term loan or whatever. The fallout from something like this can have an unexpected human cost at times. Thankfully it’s Tuesday, not a critical payroll day for most employees.
But why does this case specifically matter? What if their system was down due to their WiFi or other layers beyond your software? Would you feel the same as well?
What about all the other systems and people suffering elsewhere in the World?
I don't understand what point you're trying to make. Are you suggesting that if I can't feel empathy for everybody at once, or in every one of their circumstances, that I should not feel anything at all for anyone? That's not how anything works. Life (or, as I believe, God) brings us into contact with all kinds of people experiencing different levels of joy and pain. It's natural to empathize with the people you're around, whatever they're feeling. Don't over-complicate it.
Because my customers don’t (and shouldn’t care) it’s a third party. If I caused it there is a chance I can fix it.
So you would rather be incompetent than powerless? Choice of third party vendor on client facing services is still on you, so maybe you prefer your incompetence be more direct and tangible?
Even still, you should have policies in place to mitigate such eventualities, that way you can focus the incompetence into systematic issues instead. The larger the company, the less acceptable these failures become. Lessons learned is a better excuse for a shake and break startup than an established player that can pay to be secure.
At some point, the finger has to be pointed. Personally, I don't dread it pointing elsewhere. Just means I've done my due D and C.
Your priority (in this comment atleast) is about the finger-pointing, while the parent's priority is wanting a fix to the issue at hand.
If customers expected third party downtime to not affect their thing then you shouldn't have picked a third party provider or spent extra resources on not having a single point of failure? If they were happy with choosing the third party with knowledge of depending on said third party provider, then it was an accepted risk.
When others cause problems then you can put your feet up and surf the web waiting for resolution. Oh, wait.
The problem is, I still get the wrong end of the stick when AWS or CF go down! Management doesn't care, understandably. They just want the money to keep coming in. It's hard to convince them that this is a pretty big problem. The only thing that will calm them down a bit is to tell them Twitter is also down. If that doesn't get them, I say ChatGPT is also down. Now NOBODY will get any work done! lol.
This is why you ALWAYS have a proposal ready. I literally had my ass saved by having tickets with reliability/redundancy work clearly laid out with comments by out of touch product/people managers deprioritizing the work after attempts to pull it off the backlog (in one infamous case for a notoriously poorly conceived and expensive failure of a project that haunted us again with lost opportunity cost).
The hilarious part of the whole story is that the same PMs and product managers were (and I cannot overemphasize this enough) absolutely militant orthodox agile practitioners with jira.
Every time a major cloud goes down, management tells us why don't we have a backup service that we can switch to. Then I tell them that a bunch of services worth a lot more than us are also down. Do you really want to spend the insane amount of resources to make sure our service stays up when the global internet is down?
Having an alt to Cloudflare isn’t preposterous.
Who decided to go with AWS of CF? If its a management decision tell them you need the resources to have a fallback if they want their system to be more reliable than AWS or CF.
Haha yeah I just got off the phone and I said, look, either this gets fixed soon or there's going to be news headlines with photographs of giant queues of people milling around in airports.
When I'm debugging something, I'm not usually looking for the solution to the problem; I'm looking for sufficient evidence that I didn't cause the problem. Once I have that, the velocity at which I work slows down
My manager once asked if he could have a "quick word". I said "velocity".
Well, at least something good came out of this incident.
Perfect.
Yup, that works.
Is there a word for a feeling that there's gotta be a German word for this niche feeling?
Probably Deutschwortsehnsucht (https://www.iamexpat.de/education/education-news/german-word...)
You mean like when the Wortzusammensetzungsverdacht just hits you? (yeah, I just made that up, that's the beauty)
Fremdverfehlungserleichterung?
Entlastungvergnugen?
puhphorie
Maybe this isn’t great, but I get a hint of that feeling when I’m on an airplane and hear a baby crying. For a number of years, if I heard a baby crying, it was probably my baby and I had to deal with it. But now my kids are past that phase, so when I hear the crying, after that initial jolt of panic I realize that it isn’t my problem, and that does give me the warm fuzzies. Even though I do feel bad for the baby and their parents.
Related situation: you're at a family gathering and everyone has young kids running around. You hear a thump, and then some kid starts screaming. Conversation stops and every parent keenly listens to the screams to try and figure out whose kid just got hurt, then some other parent jumps up - it's not your kid! #phewphoria
You're not alone in this feeling. I occasionally smile when it's not my kid.
This is one of the secret joys of being a parent.
The German word “schadenfreude” means taking pleasure in someone else’s misfortune; enjoyment rather than relief.
since schaden is damage and freude is joy, not sure what it should be - maybe Schadeleichtig hmm...
>maybe Schadeleichtig
Maybe "Erleichterung" (relief)? But as a German "Schadenserleichterung" (also: notice the "s" between both compound word parts) rather sounds like a reduction of damage (since "Erleichterung" also means mitigation or alleviation).
right I thought of that at first and discarded it for that reason. Which the problem really is that the normal story of how Schadenfreude works as a bit of German language how to is that the component that it is other people's damage that is sparking joy is missing from the word itself, that interpretation must be known by the word user, if you were just creating the word and nobody had heard it before in the world it would be pretty reasonable for people to think you had just created a new word for masochism.
Schadenfriend?
You gain relief, but you don't exactly derive pleasure as it's someone you know that's getting the ass end of the deal
It's close enough to Schadenfreude but not really.
Schadenfreude
Nah, that's delight in someone else's misfortune. This is delight that the misfortune wasn't yours, which is slightly different.
4 years of German and I still don't quite "get" it :^) TY!
We have a saying:
You know how you measure eternity?
When you finish learning German.
Not quite, that’s more like taking pleasure in the misfortune of someone else. It’s close, but the specific relief bit that it is not _your_ misfortune is not captured
Internettet er vist ikke så stort :-)
Fætter! Hvor genialt at se dig her. :)
vindication?
schadenfuckup
The company where this colleague works? Cloudflare.
I woke up getting bombarded by multiple clients messages of sites not working, I shitted my pants because I've changed the config just yesterday. When I saw the status message "cloudflare down" I was so relieved.
Good that he worked it out so quick. I recently spent a day debugging email problems on Railway PaaS, because they silently closed an SMTP port without telling anyone.
How do we know your colleagues changes didn't take down Cloudflare though?
Do you guys work at Cloudflare? Do you mind reverting that change just in case?
Chances are still good that somewhere within Cloudflare someone really did do a global configuration push that brought down the internet.
When aliens study humans from this period, their book of fairy tales will include several where a terrible evil was triggered by a config push.
Plot twist: They work at Cloudflare
Is Cloudflare being down the work of conservative hackers and the rest of the internet is just collateral damage?
Wait for the post mortem ... It is a technical possibility, race condition propagates one customer config to all nodes... :-)
Did your colleague perhaps change the Cloudflare config again right now? Seems to be down again.
You should tell him his config change took down half the internet.
You missed a great opportunity to dead-pan him with something like "No, Bob, not just our site, you brought down the entire Internet, look at this post!"
> In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack.
From the CTO, Source: https://x.com/dok2001/status/1990791419653484646
It still astounds me that the big dogs still do not phase config rollouts. Code is data, configs are data, they are one and the same. It was the same issue with the giant crowdstrike outage last year, they were rawdogging configs globally and a bad config made it out there and everything went kaboom.
You NEED to phase config rollouts like you phase code rollouts.
The big dogs absolutely do phase config rollouts as a general rule.
There are still two weaknesses:
1) Some configs are inherently global and cannot be phased. There's only one place to set them. E.g. if you run a webapp, this would be configs for the load balancer as opposed to configs for each webserver
2) Some configs have a cascading effect -- even though a config is applied to 1% of servers, it affects the other servers they interact with, and a bad thing spreads across the entire network
> Some configs are inherently global and cannot be phased
This is also why "it is always DNS". It's not that DNS itself is particularly unreliable, but rather that it is the one area where you can really screw up a whole system by running a single command, even if everything else is insanely redundant.
I don’t believe that there is anything necessarily which requires DNS configs to be global.
You can shard your service behind multiple names:
my-service-1.example.com
my-service-2.example.com
my-service-3.example.com …
Then you can create smoke tests which hit each phase of the DNS and if you start getting errors you stop the rollout of the service.
Sure, but that doesn't really help for user-facing services where people expect to either type a domain name in their browser or click on a search result, and end up on your website every time.
And the access controls of DNS services are often (but not always) not fine-grained enough to actually prevent someone from ignoring the procedure and changing every single subdomain at once.
> Sure, but that doesn't really help for user-facing services where people expect to either type a domain name in their browser or click on a search result, and end up on your website every time.
It does help. For example, at my company we have two public endpoints:
company-staging.com company.com
We roll out changes to company-staging.com first and have smoke tests which hit that endpoint. If the smoketests fail we stop the rollout to company.com.
Users hit company.com
That doesn’t help with rolling out updates to the DNS for company.com which is the point here. It’s always DNS because your pre-production smoke tests can’t test your production DNS configuration.
If I'm understanding it right, the idea is that the DNS configuration for company-staging.com is identical to that for company.com - same IPs and servers, DNS provider, domain registrar. Literally the only differences are s/company/company-staging/, all accesses should hit the same server with the same request other than the Host header.
Then you can update the DNS configuration for company-staging.com, and if that doesn't break there's very little scope for the update to company.com to go differently.
The purpose of a staged rollout is to test things with some percentage of actual real-world production traffic, after having already thoroughly tested things in a private staging environment. Your staging URL doesn't have that. Unless the public happens to know about it.
The scope for it to go wrong is the differences in real-world and simulation.
It's a good thing to have, but not a replacement for the concept of staged rollout.
But users are going to example.com. Not my-service-33.example.com.
So if you've got some configuration that has a problem that only appears at the root-level domain, no amount of subdomain testing is going to catch it.
I think it's uncharitable to jump to the conclusion that just because there was a config-based outage they don't do phased config rollouts. And even more uncharitable to compare them to crowdstrike.
I have read several cloudflare postmortems and my confidence in their systems is pretty low. They used to run their entire control plane out of a single datacenter which is amateur hour for a tech company that has over $60 billion in market cap.
I also don’t understand how it is uncharitable to compare them to crowdstrike as both companies run critical systems that affect a large number of people’s lives, and both companies seem to have outages at a similar rate (if anything, cloudflare breaks more often than crowdstrike).
https://blog.cloudflare.com/18-november-2025-outage/
> The larger-than-expected feature file was then propagated to all the machines that make up our network
> As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network.
I was right. Global config rollout with bad data. Basically the same failure mode of crowdstrike.
It seem fairly logical to me? If a config change causes services to crash then rollout stops … at least in every phased rollout system i’ve ever built…
In a company I am no longer with I argued much the same when we rolled out "global CI/CD" on IAC. You made one change, committed and pushed, wham it's on 40+ server clusters globally. I hated it. The principal was enamored with it, "cattle not pets" and all that, but the result was things slowed down considerably because anyone working with it became so terrified of making big changes.
Then you get customer visible delays.
Because adversaries adapt quickly, they have a system that deploys their counter-adversary bits quickly without phasing - no matter whether they call them code or configs. See also: Crowdstrike.
You can't protect against _latent bugs_ with phased rollouts.
Wish this could rocket to the top of the comment thread, digging through hundreds of comments speculating about a cyberattack to find this felt silly
Configuration changes are dangerous for CF it seems, and knocked down $NET almost 4% today. I wonder what the industry wide impact is for each of these outages?
Pre market was red for all tech stocks today before the outage even happened
Yes, if anything it's bullish on CloudFlare because many investors don't realize how pervasive it is.
>Configuration changes are dangerous for CF it seems, and knocked down $NET almost 4% today. I wonder what the industry wide impact is for each of these outages?
This is becoming the "new normal." It seems like every few months, there's another "outage" that takes down vast swathes of internet properties, since they're all dependent on a few platforms and those platforms are, clearly, poorly run.
This isn't rocket surgery here. Strong change management, QA processes and active business continuity planning/infrastructure would likely have caught this (or not), as is clear from other large platforms that we don't even think about because outages are so rare.
Like airline reservations systems[0], credit card authorization systems from VISA/MasterCard, American Express, etc.
Those systems (and others) have outages in the "once a decade" or even much, much, longer ranges. Are the folks over at SABRE and American Express that much smarter and better than Cloudflare/AWS/Google Cloud/etc.? No. Not even close. What they are is careful as they know their business is dependent on making sure their customers can use their services anytime/anywhere, without issue.
It amazes me the level of "Stockholm Syndrome"[1] expressed by many posting to this thread, expressing relief that it wasn't "an attack" and essentially blaming themselves for not having the right tools (API keys, etc.) to recover from the gross incompetence of, this time at least, Cloudflare.
I don't doubt that I'll get lots of push back from folks claiming, "it's hard to do things at scale," and/or "there are way too many moving parts," and the like.
Other organizations like the ones I mention above don't screw they're customers every 4-6 months with (clearly) insufficiently tested configuration and infrastructure changes.
Yet many here seem to think that's fine, even though such outages are often crushing to their businesses. But if the customers of these huge providers don't demand better, they'll only get worse. And that's not (at least in my experience) a very deep or profound idea.
[0] https://en.wikipedia.org/wiki/Airline_reservations_system