How we’re responding to The NYT’s data demands in order to protect user privacy

Comments

By _jab 2025-06-063:005 reply

> How will you store my data and who can access it?

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

By tptacek 2025-06-063:075 reply

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.

By VanTheBrand 2025-06-064:232 reply

The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.

By tptacek 2025-06-064:391 reply

No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").

By bee_rider 2025-06-065:202 reply

Standard corporate spin, then?

By bunderbunder 2025-06-0615:101 reply

No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.

By jmull 2025-06-0615:521 reply

This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.

By bunderbunder 2025-06-0617:141 reply

I suppose it's down to how you define "spin". Personally I'm in favor of a definition of the term that doesn't excessively dilute it.

By bee_rider 2025-06-0618:172 reply

Can you share your definition? This is actually quite puzzling because as far as I know “spin” has always been associated with presenting things in a way that benefits you. Like, decades ago, they could have the show “Bill O’Rilley’s No Spin Zone” and everybody knew the premise was that they argue against guests who were trying to tell a “massaged” version of the story, and that they’d go for some actual truth (fwiw I thought the whole show was full of crap, but the name was not confusing or ambiguous).

I’m not aware of any definition of “spin” where being conventional is a defense against that accusation. Actually, that was the (imagined) value-add of the show, that conventional corporate and political messaging is heavily spun.

By bunderbunder 2025-06-0914:491 reply

Spin, like you illustrate in your comment, has connotations of distorting the truth.

Simply denying the allegations isn't really spinning anything; it's just denying the allegations. And The thing I dislike about characterizing something like this as spin is that it defangs the term by removing all those connotations and instead turning it into just a buzzwordy way of saying, "I disagree with what this person said."

By bee_rider 2025-06-1015:381 reply

They didn’t just deny the allegations. They called the case baseless. The case is clearly not baseless, in the sense that there’s at least enough of a basis that the court didn’t vacate the order to preserve the chats.

It seems to me that the discussion of whether or not it is spin has turned into a discussion of which party people basically agree with.

My personal opinion is that OpenAI will probably win, or at least get away with a pretty minor fine or something like that. However, the communications coming from both parties in the case should be assumed to be corporate spin until proven otherwise. And, calling an unfinished case baseless is, at least, a bit presumptuous!

By bunderbunder 2025-06-1120:381 reply

That's legalese. You can't interpret legal jargon using vernacular definitions of the terms.

By bee_rider 2025-06-1417:551 reply

The source is a message intended for mass consumption, so it should not be interpreted in legalese.

By bunderbunder 2025-06-1614:08

How you want the law to work, and how the law works, are not necessarily the same thing.

By skissane 2025-06-0820:471 reply

There's a difference between "we are choosing to phrase it this way" versus "our lawyers told us we have to say this". "Spin" is generally seen as a voluntary action, which makes the former a clearcut case of it, the latter less so.

By bee_rider 2025-06-091:46

1) taking your lawyer’s advice is a voluntary action (although it is probably a good one)

2) I don’t understand the distinction being made between voluntary or involuntary, in the sense that a corporation is a thing made up of by people, it doesn’t have a will in-and-of-itself, so the communications it sends must always actually be made by somebody inside the corporation (whether a lawyer, marketing person, or in the unlikely event that somebody lets them out, an engineer).

By tptacek 2025-06-065:344 reply

No? "Spin" implies there was something else they could possibly say.

By justacrow 2025-06-067:381 reply

They could choose to not say it

By ethbr1 2025-06-0612:36

Indeed. Taken to its conclusion, this thread suggests that corporations are justified in saying whatever they want in order to further their own ends.

Including lies.

I'd like to aim a little higher, maybe towards expecting correspondence with reality?

IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.

By mmooss 2025-06-066:04

I haven't heard that interpretation; I might call it spin of spin.

By mrgoldenbrown 2025-06-0613:341 reply

If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.

By inquirerGeneral 2025-06-0614:02

[dead]

By bee_rider 2025-06-0618:20

That is unrelated to what the expression means.

By adamsb6 2025-06-0614:03

I’m typing these words from a brain that has absorbed copyrighted works.

By mhitza 2025-06-068:55

My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].

And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]

[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...

By ofjcihen 2025-06-0613:541 reply

They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.

By gruez 2025-06-0615:032 reply

>They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".

By ofjcihen 2025-06-0615:561 reply

Here’s a good article that explains what you may be missing.

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

By gruez 2025-06-0616:011 reply

Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.

By ofjcihen 2025-06-0616:291 reply

Sure.

Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.

By lcnPylGDnU4H9OF 2025-06-0620:301 reply

> the article explains for those following this conversation why it’s been elevated to a court order

That article does nothing of the sort and, indeed, it is talking about a completely separate incident of deleting data.

By ofjcihen 2025-06-0620:451 reply

No worries. I can’t force understanding on anyone.

Here. I had an LLM summarize it for you.

A court order now requires OpenAI to retain all user data, including deleted ChatGPT chats, as part of the ongoing copyright lawsuit brought by The New York Times (NYT) and other publishers[1][2][6][7]. This order was issued because the NYT argued that evidence of copyright infringement—such as AI outputs closely matching NYT articles—could be lost if OpenAI continued its standard practice of deleting user data after 30 days[2][6][7].

This new requirement is directly related to a 2024 incident where OpenAI accidentally deleted critical data that NYT lawyers had gathered during the discovery process. In that incident, OpenAI engineers erased programs and search result data stored by NYT's legal team on dedicated virtual machines provided for examining OpenAI's training data[3][4][5]. Although OpenAI recovered some of the data, the loss of file structure and names rendered it largely unusable for the lawyers’ purposes[3][5]. The court and NYT lawyers did not believe the deletion was intentional, but it highlighted the risks of relying on OpenAI’s internal data retention and deletion practices during litigation[3][4][5].

The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7]. The order aims to prevent any further loss of potentially relevant information as the case proceeds. OpenAI is appealing the order, arguing it conflicts with user privacy and their established data deletion policies[1][2][6][7].

Sources [1] OpenAI Appeals Court Order Requiring Retention of Consumer Data https://www.pymnts.com/artificial-intelligence-2/2025/openai... [2] ‘An Inappropriate Request’: OpenAI Appeals ChatGPT Data Retention Court Order https://www.eweek.com/news/openai-privacy-appeal-new-york-ti... [3] OpenAI Deletes Legal Data in a Lawsuit From the New York Times https://www.businessinsider.com/openai-delete-legal-data-law... [4] NYT vs OpenAI case: OpenAI accidentally deleted case data https://www.medianama.com/2024/11/223-new-york-times-openai-... [5] New York Times Says OpenAI Erased Potential Lawsuit Evidence https://www.wired.com/story/new-york-times-openai-erased-pot... [6] How we're responding to The New York Times' data ... - OpenAI https://openai.com/index/response-to-nyt-data-demands/ [7] Why OpenAI Won't Delete Your ChatGPT Chats Anymore: New York ... https://coincentral.com/why-openai-wont-delete-your-chatgpt-... [8] A Federal Judge Ordered OpenAI to Stop Deleting Data - Adweek https://www.adweek.com/media/a-federal-judge-ordered-openai-... [9] OpenAI confronts user panic over court-ordered retention of ChatGPT logs https://arstechnica.com/tech-policy/2025/06/openai-confronts... [10] OpenAI Appeals ‘Sweeping, Unprecedented Order’ Requiring It Maintain All ChatGPT Logs https://gizmodo.com/openai-appeals-sweeping-unprecedented-or... [11] OpenAI accidentally deleted potential evidence in NY ... - TechCrunch https://techcrunch.com/2024/11/22/openai-accidentally-delete... [12] OpenAI's Shocking Blunder: Key Evidence Vanishes in NY Times ... https://www.eweek.com/news/openai-deletes-potential-evidence... [13] Judge allows 'New York Times' copyright case against OpenAI to go ... https://www.npr.org/2025/03/26/nx-s1-5288157/new-york-times-... [14] OpenAI Data Retention Court Order: Implications for Everybody https://hackernoon.com/openai-data-retention-court-order-imp... [15] Sam Altman calls for 'AI privilege' as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions https://venturebeat.com/ai/sam-altman-calls-for-ai-privilege... [16] Court orders OpenAI to preserve all ChatGPT logs, including deleted ... https://techstartups.com/2025/06/06/court-orders-openai-to-p... [17] OpenAI deleted NYT copyright case evidence, say lawyers https://www.theregister.com/2024/11/21/new_york_times_lawyer... [18] OpenAI slams court order to save all ChatGPT logs, including ... https://simonwillison.net/2025/Jun/5/openai-court-order/ [19] OpenAI accidentally deleted potential evidence in New York Times ... https://mashable.com/article/openai-accidentally-deleted-pot... [20] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://news.ycombinator.com/item?id=44185913 [21] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://arstechnica.com/tech-policy/2025/06/openai-says-cour... [22] After court order, OpenAI is now preserving all ChatGPT and API logs https://www.reddit.com/r/LocalLLaMA/comments/1l3niws/after_c... [23] OpenAI accidentally erases potential evidence in training data lawsuit https://www.theverge.com/2024/11/21/24302606/openai-erases-e... [24] OpenAI "accidentally" erased ChatGPT training findings as lawyers ... https://www.reddit.com/r/aiwars/comments/1gwxr94/openai_acci... [25] OpenAI appeals data preservation order in NYT copyright case https://www.reuters.com/business/media-telecom/openai-appeal...

By lcnPylGDnU4H9OF 2025-06-0621:081 reply

You linked this article:

https://techcrunch.com/2024/11/22/openai-accidentally-delete...

Gruez said that is talking about an incident in this case but unrelated to the judge's order in question.

You said the article "explains for those following this conversation why it’s been elevated to a court order" but it doesn't actually explain that. It is talking about separate data being deleted in a different context. It is not user chats and access logs. It is the data that was used to train the models.

I pointed that out a second time since it seemed to be misunderstood.

Then you posted an LLM summary of something unrelated to the point being made.

Now we're here.

As you say, one cannot force understanding on another; we all have to do our part. ;)

Edit:

> The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7].

What did you prompt the LLM with for it to reach this conclusion? The [2][6][7] citations similarly don't seem to explain how that incident from months ago informed the judge's recent decision. Anyway, I'm not saying the conclusion is wrong, I'm saying the article you linked does not support the conclusion.

By ofjcihen 2025-06-0621:261 reply

I think in your rush to reply you may have not read the summarization.

Calm down, cool off, and read it again.

The point is that the circumstances of the incident in 2024 are directly related to the how and why of the NYT lawyers request and the judges order.

The article I linked was to the incident in 2024.

Not everything has to be about pedantry and snark, even on HN.

Edit: I see you edited your response after re-reading the summarization. I’m glad cooler heads have prevailed.

The prompt was simply “What is the relation, if any, between OpenAI being ordered to retain user data and the incident from 2024 where OpenAI accidentally deleted the NYT lawyers data while they were investigating whether OpenAI had used their data to train their models?”

By lcnPylGDnU4H9OF 2025-06-0621:46

> I see you edited your response after re-reading the summarization.

Just to be clear, the summary is not convincing. I do understand the idea but none of the evidence presented so far suggests that was the reason. The court expected that the data would be retained, the court learned that it was not, the court gave an order for it to be retained. That is the seeming reason for the order.

Put another way: if the incident last year had not happened, the court would still have issued the order currently under discussion.

By mmooss 2025-06-066:012 reply

> It's not an attempt to spin the lawsuit; it's about reassuring their customers.

It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.

By roywiggins 2025-06-0615:00

It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.

By fallingknife 2025-06-0614:211 reply

Why does OpenAI have any obligation to present the NYTs side?

By mmooss 2025-06-0617:59

Who said 'obligation'?

By conartist6 2025-06-0611:02

It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.

By lxgr 2025-06-063:16

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.

By sashank_1509 2025-06-063:061 reply

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.

By ivape 2025-06-069:072 reply

To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).

By Workaccount2 2025-06-0614:351 reply

> It's all stolen.

LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.

By edbaskerville 2025-06-0615:32

Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.

By fallingknife 2025-06-0614:051 reply

By tiahura 2025-06-0615:151 reply

incorrect. copyright applies to derived works.

By vel0city 2025-06-0615:291 reply

Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.

By fallingknife 2025-06-0615:531 reply

Please show me one of these prompts

By vel0city 2025-06-0616:42

NYT has examples in their legal complaint. See page 30.

https://www.scribd.com/document/695189742/NYT-v-OpenAI

By hiddencost 2025-06-063:242 reply

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

I am not an Open AI stan, but this needs to be responded to.

The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.

This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".

Anyone who makes promises about data security is at best incompetent and at worst dishonest.

By nhecker 2025-06-0614:58

Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

By JohnKemeny 2025-06-065:421 reply

> Anyone who makes promises about data security is at best incompetent and at worst dishonest.

Shouldn't that be "at best dishonest and at worst incompetent"?

I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?

By HPsquared 2025-06-067:51

An incompetent but honest person is more likely to accept correction and respond to feedback generally.

By pritambarhate 2025-06-065:543 reply

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.

I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.

By conartist6 2025-06-0611:05

You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy

By DrillShopper 2025-06-0618:19

The OpenAI Privacy Policy specifically allows them to keep data as required by law.

By mmooss 2025-06-066:08

> who don't even care about NYT's content or bypassing their paywalls.

Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.

If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?

> (jeapordizes)

... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.

By molf 2025-06-0610:367 reply

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

By miles 2025-06-0617:131 reply

> I get that approval needs to be given, and that there are barriers to entry.

Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.

By AlecSchueler 2025-06-0618:03

> what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

Product development?

By ArnoVW 2025-06-0615:481 reply

My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation

By lcnPylGDnU4H9OF 2025-06-0619:23

> And that you can request 0 days.

Right but the problem they're having is that the request is ignored.

By pclmulqdq 2025-06-0613:341 reply

The missing ingredient is money.

By jewelry 2025-06-0614:131 reply

not just money. How are you going to support this client’s support ticket if there is no log at all?

By ethbr1 2025-06-0615:362 reply

Don't. "We're unable to provide support for your request, because you disabled retention." Easy.

By krisoft 2025-06-0710:33

You can still provide support too if you want to. You just need to ask the user what their query was, what response they got, and what response they would be expecting. You can then as the expert either spot their problem immediately, or you can run the query and see for yourself what is going on.

Sure it is a possibility that the ticket will end up closed as “unable to reproduce”, but that is always a possibility. It is not like you have to shut off all support because that might happen.

Plus many support requests are not about the content of the api responses but meta info surrounding them. Support can tell you that you are over the api quota limit even if the content of your prompt was not logged. They can also tell you if your request is missing a required parameter or if they have had 500 errors because of a bad update on their part.

By hirsin 2025-06-0615:381 reply

They don't care, they still want support and most leadership teams are unwilling to stand behind a stance of telling customers no.

By abeppu 2025-06-0615:54

... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.

By belter 2025-06-0614:091 reply

If this stands I dont think they can operate in the EU

By bunderbunder 2025-06-0615:061 reply

I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.

By glookler 2025-06-0617:521 reply

>> Does this court order violate GDPR or my rights under European or other privacy laws?

>> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.

By danielfoster 2025-06-0618:07

They didn’t say which law (the US judge’s order or EU law) they are complying with.

By 1vuio0pswjnm7 2025-06-0622:04

"You can also request zero data retention (ZDR) for eligible endpoints if you have a qualifying use-case. For details on data handling, visit our Platform Docs page."

https://openai.com/en-GB/policies/row-privacy-policy/

1. You can request it but there is no promise the request will be granted.

Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.

It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.

By lmm 2025-06-0614:281 reply

> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

What's the betting that they just write it on the website and never actually implemented it?

By sigmoid10 2025-06-0614:37

Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.

By inquirerGeneral 2025-06-0614:02

[dead]

By supriyo-biswas 2025-06-062:484 reply

I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.

After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.

I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.

[1] https://ssdeep-project.github.io/ssdeep/index.html

[2] https://joshleeb.com/posts/content-defined-chunking.html

By delusional 2025-06-065:24

I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.

For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.

I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.

By paxys 2025-06-062:504 reply

Yeah, try explaining any of these words to a lawyer or judge.

By sthatipamala 2025-06-064:411 reply

The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)

By anshumankmr 2025-06-065:381 reply

As I looked up the judge for this one(https://en.wikipedia.org/wiki/William_Alsup) who was a hobbyist basic programmer, one would need a judge who coded MNIST as a passtime hobby if that is the case.

By king_magic 2025-06-0610:14

a smart judge who is minimally tech savvy could learn to train a model to predict MNIST in a day or two

By fc417fc802 2025-06-063:26

I thought that's what GPT was for.

By m463 2025-06-062:53

"you are a helpful law assistant."

By landl0rd 2025-06-064:06

"You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."

By LandoCalrissian 2025-06-064:182 reply

Trying to actively circumvent the intention of a judges order is a pretty bad idea.

By Aeolun 2025-06-064:37

That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.

By girvo 2025-06-064:30

Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha

By bigyabai 2025-06-063:364 reply

All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.

Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.

By landl0rd 2025-06-064:098 reply

Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.

It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

By tdeck 2025-06-064:191 reply

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

The laws have changed since then and it's not for the better:

https://www.aclu.org/press-releases/congress-passing-bill-th...

By tuckerman 2025-06-064:411 reply

Even if the laws give them this power, I believe it would be extremely difficult for an operation like this to go unnoticed (and therefore unreported) at most of these companies. MUSCULAR [1] was able to be pulled off because of the cleartext inter-datacenter traffic which was subsequently encrypted. It's hard to see how they could pull off a similar operation without the cooperation of Google which would also entail a tremendous internal cover up.

[1] https://en.wikipedia.org/wiki/MUSCULAR

By onli 2025-06-0610:111 reply

Warrantlessly installed backdoors in the log system combined with a gag order, combined with secret courts, all "perfectly legal". Not really hard to imagine.

By tuckerman 2025-06-0613:47

You would have to gag a huge chunk of the engineers and I just don’t think that would work without leaks. Google’s infrastructure would not make something like that easy to do clandestinely (trying to avoid saying impossible but it gets close).

I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.

By dmurray 2025-06-065:45

> You're pointing to the Russel's Teapot of sigint.

If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.

By cwillu 2025-06-064:181 reply

> I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

The input is what's interesting.

By Aeolun 2025-06-064:41

It doesn’t change the monumental scope of the problem though.

Though I’m inclined to believe the US gov can if OpenAI can.

By Yizahi 2025-06-0610:071 reply

Metadata is spying (c) Bruce Schneier

If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.

Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.

By jstanley 2025-06-0610:11

The comment you replied to isn't saying that metadata isn't spying. It's saying that the spies generally don't have free access to content data.

By rl3 2025-06-067:13

>However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.

That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.

These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.

By zer00eyz 2025-06-064:29

> However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.

> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...

Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )

By komali2 2025-06-064:28

There's no way to know, but it's safer to assume.

By Workaccount2 2025-06-0614:391 reply

My choice conspiracy is that the three letter agencies actively support their omnipresent, omniknowing conspiracies because it ultimately plays into their hand. Sorta like a Santa Claus for citizens.

By bigyabai 2025-06-0617:37

> because it ultimately plays into their hand.

How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.

In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?

Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.

By 7speter 2025-06-064:331 reply

Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!

By nl 2025-06-064:552 reply

As the linked page points out you can turn this off in settings if you are an end user or choose zero retention if you are an API user.

By justacrow 2025-06-067:361 reply

I mean, they already stole and used all copyrighted material they could find to train the thing, am I supposed to believe that thry wont use my data just because I tick a checkbox?

By stock_toaster 2025-06-069:53

Agreed, I have hard time believing anything the eye scanning crypto coin (worldcoin or whatever) guy says at this point.

By Jackpillar 2025-06-0617:42

I wish I could test drive your brain to experience a world where one believes that would stop them from stealing your data.

By rl3 2025-06-067:03

>Of course it's backdoored, you can't even begin to try proving me wrong.

On the contrary.

>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.

I think you're being unduly paranoid. /s

https://www.theverge.com/2024/6/13/24178079/openai-board-pau...

https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...

By farts_mckensy 2025-06-064:133 reply

Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.

By artursapek 2025-06-064:322 reply

I’ve done my part cluttering it with my requests for the same banana bread recipe like 5 separate times.

By refuser 2025-06-064:38

It was that good?

By baobun 2025-06-065:26

gief

By bigyabai 2025-06-066:121 reply

"We kill people based on metadata." - National Security Agency Gen. Michael Hayden

Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.

By farts_mckensy 2025-06-0615:171 reply

This is mostly security theater, and generally not worth the lift when you consider the steps needed to unlock the value of that data in the context of investigations.

By bigyabai 2025-06-0617:221 reply

Citation?

By farts_mckensy 2025-06-0621:37

-The Privacy and Civil Liberties Oversight Board’s 2014 review of the NSA “Section 215” phone-record program found no instance in which the dragnet produced a counter-terror lead that couldn’t have been obtained with targeted subpoenas. https://en.m.wikipedia.org/wiki/Privacy_and_Civil_Liberties_...

-After Boston, Paris, Manchester, and other attacks, post-mortems showed the perpetrators were already in government databases. Analysts simply didn’t connect the dots amid the flood of benign hits. https://www.newyorker.com/magazine/2015/01/26/whole-haystack

-Independent tallies suggest dozens of civilians killed for every intended high-value target in Yemen and Pakistan, largely because metadata mis-identifies phones that change pockets. https://committees.parliament.uk/writtenevidence/36962/pdf

By brigandish 2025-06-064:431 reply

Search engines have been doing this since the mid 90s and have only improved, to think that any data is obfuscated by its being part of some huge volume of other data is a fallacy at best.

By farts_mckensy 2025-06-065:381 reply

Search engines use our data for completely different purposes.

By yunwal 2025-06-0612:421 reply

That doesn’t negate the GPs point. It’s easy to make datasets searchable.

By farts_mckensy 2025-06-0615:121 reply

Searchable? You have to know what to search for, and you have to rule out false positives. How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something? That's not something a search function can distinguish. It requires a human to sift through that data.

By brigandish 2025-06-073:341 reply

> How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something?

Meta data and investigation.

> That's not something a search function can distinguish.

We know that it can narrow down hugely from the initial volume.

> It requires a human to sift through that data.

Yes, the point of collating, analysing, and searching data is not to make final judgements but to find targets for investigation by the available agents. That's the same reason we all use search engines, to narrow down, they never produce what we intend by intention alone, we still have to read the final results. Magic is still some way off.

You're acting as if we can automate humans out of the loop entirely, which would be a straw man. Is anyone saying we can get rid of the police or security agencies by using AI? Or perhaps AI will become the police, perhaps it will conduct traffic stops using driverless cars and robots? I suppose it could happen, though I'm not sure what the relevance would be here.

By farts_mckensy 2025-06-0817:251 reply

The data is obfuscated and the cost to unlock the value of it is often not worth the effort.

By brigandish 2025-06-093:581 reply

And yet billions of dollars (at least) has gone into it. A whole group of people with access to the data and the means to sift it disagree and are willing to put their money behind it, so your bare assertions count for nowt.

By farts_mckensy 2025-06-0920:59

Great. What do you think that proves? That doesn't negate my inital argument. The data is largely useless, and often counterproductive. The evidence shows the vast majority of plots are foiled through conventional means, and ruling out false positives is more trouble than it's worth. I cited sources in this thread. Where are your sources?

"Corporations and the US government are spending money on it, so it must be useful." Are you serious? Lmao.

How we’re responding to The NYT’s data demands in order to protect user privacy

Show article

BUFU

Comments

By _jab 2025-06-063:005 reply

By tptacek 2025-06-063:075 reply

By VanTheBrand 2025-06-064:232 reply

By tptacek 2025-06-064:391 reply

By bee_rider 2025-06-065:202 reply

By bunderbunder 2025-06-0615:101 reply

By jmull 2025-06-0615:521 reply

By bunderbunder 2025-06-0617:141 reply

By bee_rider 2025-06-0618:172 reply

By bunderbunder 2025-06-0914:491 reply

By bee_rider 2025-06-1015:381 reply

By bunderbunder 2025-06-1120:381 reply

By bee_rider 2025-06-1417:551 reply

By bunderbunder 2025-06-1614:08

By skissane 2025-06-0820:471 reply

By bee_rider 2025-06-091:46

By tptacek 2025-06-065:344 reply

By justacrow 2025-06-067:381 reply

By ethbr1 2025-06-0612:36

By mmooss 2025-06-066:04

By mrgoldenbrown 2025-06-0613:341 reply

By inquirerGeneral 2025-06-0614:02

By bee_rider 2025-06-0618:20

By adamsb6 2025-06-0614:03

By mhitza 2025-06-068:55

By ofjcihen 2025-06-0613:541 reply

By gruez 2025-06-0615:032 reply

By ofjcihen 2025-06-0615:561 reply

By gruez 2025-06-0616:011 reply

By ofjcihen 2025-06-0616:291 reply

By lcnPylGDnU4H9OF 2025-06-0620:301 reply

By ofjcihen 2025-06-0620:451 reply

By lcnPylGDnU4H9OF 2025-06-0621:081 reply

By ofjcihen 2025-06-0621:261 reply

By lcnPylGDnU4H9OF 2025-06-0621:46

By mmooss 2025-06-066:012 reply

By roywiggins 2025-06-0615:00

By fallingknife 2025-06-0614:211 reply

By mmooss 2025-06-0617:59

By conartist6 2025-06-0611:02

By lxgr 2025-06-063:16

By sashank_1509 2025-06-063:061 reply

By ivape 2025-06-069:072 reply

By Workaccount2 2025-06-0614:351 reply

By edbaskerville 2025-06-0615:32

By fallingknife 2025-06-0614:051 reply

By tiahura 2025-06-0615:151 reply

By vel0city 2025-06-0615:291 reply

By fallingknife 2025-06-0615:531 reply

By vel0city 2025-06-0616:42

By hiddencost 2025-06-063:242 reply

By nhecker 2025-06-0614:58

By JohnKemeny 2025-06-065:421 reply

By HPsquared 2025-06-067:51

By pritambarhate 2025-06-065:543 reply

By conartist6 2025-06-0611:05

By DrillShopper 2025-06-0618:19

By mmooss 2025-06-066:08

By molf 2025-06-0610:367 reply

By miles 2025-06-0617:131 reply

By AlecSchueler 2025-06-0618:03

By ArnoVW 2025-06-0615:481 reply

By lcnPylGDnU4H9OF 2025-06-0619:23

By pclmulqdq 2025-06-0613:341 reply

By jewelry 2025-06-0614:131 reply

By ethbr1 2025-06-0615:362 reply

By krisoft 2025-06-0710:33

By hirsin 2025-06-0615:381 reply

By abeppu 2025-06-0615:54

By belter 2025-06-0614:091 reply

By bunderbunder 2025-06-0615:061 reply

By glookler 2025-06-0617:521 reply

By danielfoster 2025-06-0618:07

By 1vuio0pswjnm7 2025-06-0622:04

By lmm 2025-06-0614:281 reply

By sigmoid10 2025-06-0614:37