> How will you store my data and who can access it?
> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.
> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.
So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.
Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.
No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.
The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.
No, it's not. It's absolutely standard corporate communications. If they're fighting the lawsuit, that is essentially the only thing they can say about it. Ford Motor Company would say the same thing (well, they'd probably say "meritless and frivolous").
No, this isn't even close to spin, it's just a standard part of defending your case. In the US tort system you need to be constantly publicly saying you did nothing wrong. Any wavering on that point could be used against you in court.
This is a funny thread. You say "No" but then restate the point with slightly different words. As if anything a company says publicly about ongoing litigation isn't spin.
I suppose it's down to how you define "spin". Personally I'm in favor of a definition of the term that doesn't excessively dilute it.
Can you share your definition? This is actually quite puzzling because as far as I know “spin” has always been associated with presenting things in a way that benefits you. Like, decades ago, they could have the show “Bill O’Rilley’s No Spin Zone” and everybody knew the premise was that they argue against guests who were trying to tell a “massaged” version of the story, and that they’d go for some actual truth (fwiw I thought the whole show was full of crap, but the name was not confusing or ambiguous).
I’m not aware of any definition of “spin” where being conventional is a defense against that accusation. Actually, that was the (imagined) value-add of the show, that conventional corporate and political messaging is heavily spun.
Spin, like you illustrate in your comment, has connotations of distorting the truth.
Simply denying the allegations isn't really spinning anything; it's just denying the allegations. And The thing I dislike about characterizing something like this as spin is that it defangs the term by removing all those connotations and instead turning it into just a buzzwordy way of saying, "I disagree with what this person said."
They didn’t just deny the allegations. They called the case baseless. The case is clearly not baseless, in the sense that there’s at least enough of a basis that the court didn’t vacate the order to preserve the chats.
It seems to me that the discussion of whether or not it is spin has turned into a discussion of which party people basically agree with.
My personal opinion is that OpenAI will probably win, or at least get away with a pretty minor fine or something like that. However, the communications coming from both parties in the case should be assumed to be corporate spin until proven otherwise. And, calling an unfinished case baseless is, at least, a bit presumptuous!
That's legalese. You can't interpret legal jargon using vernacular definitions of the terms.
The source is a message intended for mass consumption, so it should not be interpreted in legalese.
How you want the law to work, and how the law works, are not necessarily the same thing.
There's a difference between "we are choosing to phrase it this way" versus "our lawyers told us we have to say this". "Spin" is generally seen as a voluntary action, which makes the former a clearcut case of it, the latter less so.
1) taking your lawyer’s advice is a voluntary action (although it is probably a good one)
2) I don’t understand the distinction being made between voluntary or involuntary, in the sense that a corporation is a thing made up of by people, it doesn’t have a will in-and-of-itself, so the communications it sends must always actually be made by somebody inside the corporation (whether a lawyer, marketing person, or in the unlikely event that somebody lets them out, an engineer).
No? "Spin" implies there was something else they could possibly say.
Indeed. Taken to its conclusion, this thread suggests that corporations are justified in saying whatever they want in order to further their own ends.
Including lies.
I'd like to aim a little higher, maybe towards expecting correspondence with reality?
IOW, yes, there is no law that OpenAi can't try to spin this. But it's still a shitty, non-factually-based choice to make.
I haven't heard that interpretation; I might call it spin of spin.
If you're being held at gunpoint and forced to lie, your words are still a lie. Whether you were forced or not is a separate dimension.
[dead]
That is unrelated to what the expression means.
I’m typing these words from a brain that has absorbed copyrighted works.
My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].
And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]
[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...
[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...
They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.
Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.
>They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.
From what I can tell from the court filings, prior to the judge's order to retain everything, the request to retain everything was coming from the plaintiff, with openai objecting to the request and refusing to comply in the meantime. If so, it's a bit misleading to characterize this as "deleting things they shouldn’t have", because what they "should have" done wasn't even settled. That's a bit rich coming from someone accusing openai of "spin".
Here’s a good article that explains what you may be missing.
https://techcrunch.com/2024/11/22/openai-accidentally-delete...
Your linked article talks about openai deleting training data. I don't see how that's related to the current incident, which is about user queries. The ruling from the judge for openai to retain all user queries also didn't reference this incident.
Sure.
Without this devolving into a tit for tat then the article explains for those following this conversation why it’s been elevated to a court order and not just an expectation to preserve.
> the article explains for those following this conversation why it’s been elevated to a court order
That article does nothing of the sort and, indeed, it is talking about a completely separate incident of deleting data.
No worries. I can’t force understanding on anyone.
Here. I had an LLM summarize it for you.
A court order now requires OpenAI to retain all user data, including deleted ChatGPT chats, as part of the ongoing copyright lawsuit brought by The New York Times (NYT) and other publishers[1][2][6][7]. This order was issued because the NYT argued that evidence of copyright infringement—such as AI outputs closely matching NYT articles—could be lost if OpenAI continued its standard practice of deleting user data after 30 days[2][6][7].
This new requirement is directly related to a 2024 incident where OpenAI accidentally deleted critical data that NYT lawyers had gathered during the discovery process. In that incident, OpenAI engineers erased programs and search result data stored by NYT's legal team on dedicated virtual machines provided for examining OpenAI's training data[3][4][5]. Although OpenAI recovered some of the data, the loss of file structure and names rendered it largely unusable for the lawyers’ purposes[3][5]. The court and NYT lawyers did not believe the deletion was intentional, but it highlighted the risks of relying on OpenAI’s internal data retention and deletion practices during litigation[3][4][5].
The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7]. The order aims to prevent any further loss of potentially relevant information as the case proceeds. OpenAI is appealing the order, arguing it conflicts with user privacy and their established data deletion policies[1][2][6][7].
Sources [1] OpenAI Appeals Court Order Requiring Retention of Consumer Data https://www.pymnts.com/artificial-intelligence-2/2025/openai... [2] ‘An Inappropriate Request’: OpenAI Appeals ChatGPT Data Retention Court Order https://www.eweek.com/news/openai-privacy-appeal-new-york-ti... [3] OpenAI Deletes Legal Data in a Lawsuit From the New York Times https://www.businessinsider.com/openai-delete-legal-data-law... [4] NYT vs OpenAI case: OpenAI accidentally deleted case data https://www.medianama.com/2024/11/223-new-york-times-openai-... [5] New York Times Says OpenAI Erased Potential Lawsuit Evidence https://www.wired.com/story/new-york-times-openai-erased-pot... [6] How we're responding to The New York Times' data ... - OpenAI https://openai.com/index/response-to-nyt-data-demands/ [7] Why OpenAI Won't Delete Your ChatGPT Chats Anymore: New York ... https://coincentral.com/why-openai-wont-delete-your-chatgpt-... [8] A Federal Judge Ordered OpenAI to Stop Deleting Data - Adweek https://www.adweek.com/media/a-federal-judge-ordered-openai-... [9] OpenAI confronts user panic over court-ordered retention of ChatGPT logs https://arstechnica.com/tech-policy/2025/06/openai-confronts... [10] OpenAI Appeals ‘Sweeping, Unprecedented Order’ Requiring It Maintain All ChatGPT Logs https://gizmodo.com/openai-appeals-sweeping-unprecedented-or... [11] OpenAI accidentally deleted potential evidence in NY ... - TechCrunch https://techcrunch.com/2024/11/22/openai-accidentally-delete... [12] OpenAI's Shocking Blunder: Key Evidence Vanishes in NY Times ... https://www.eweek.com/news/openai-deletes-potential-evidence... [13] Judge allows 'New York Times' copyright case against OpenAI to go ... https://www.npr.org/2025/03/26/nx-s1-5288157/new-york-times-... [14] OpenAI Data Retention Court Order: Implications for Everybody https://hackernoon.com/openai-data-retention-court-order-imp... [15] Sam Altman calls for 'AI privilege' as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions https://venturebeat.com/ai/sam-altman-calls-for-ai-privilege... [16] Court orders OpenAI to preserve all ChatGPT logs, including deleted ... https://techstartups.com/2025/06/06/court-orders-openai-to-p... [17] OpenAI deleted NYT copyright case evidence, say lawyers https://www.theregister.com/2024/11/21/new_york_times_lawyer... [18] OpenAI slams court order to save all ChatGPT logs, including ... https://simonwillison.net/2025/Jun/5/openai-court-order/ [19] OpenAI accidentally deleted potential evidence in New York Times ... https://mashable.com/article/openai-accidentally-deleted-pot... [20] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://news.ycombinator.com/item?id=44185913 [21] OpenAI slams court order to save all ChatGPT logs, including deleted chats https://arstechnica.com/tech-policy/2025/06/openai-says-cour... [22] After court order, OpenAI is now preserving all ChatGPT and API logs https://www.reddit.com/r/LocalLLaMA/comments/1l3niws/after_c... [23] OpenAI accidentally erases potential evidence in training data lawsuit https://www.theverge.com/2024/11/21/24302606/openai-erases-e... [24] OpenAI "accidentally" erased ChatGPT training findings as lawyers ... https://www.reddit.com/r/aiwars/comments/1gwxr94/openai_acci... [25] OpenAI appeals data preservation order in NYT copyright case https://www.reuters.com/business/media-telecom/openai-appeal...
You linked this article:
https://techcrunch.com/2024/11/22/openai-accidentally-delete...
Gruez said that is talking about an incident in this case but unrelated to the judge's order in question.
You said the article "explains for those following this conversation why it’s been elevated to a court order" but it doesn't actually explain that. It is talking about separate data being deleted in a different context. It is not user chats and access logs. It is the data that was used to train the models.
I pointed that out a second time since it seemed to be misunderstood.
Then you posted an LLM summary of something unrelated to the point being made.
Now we're here.
As you say, one cannot force understanding on another; we all have to do our part. ;)
Edit:
> The court order to retain all user data is a direct response to concerns that important evidence could be lost—just as it was in the accidental deletion incident[2][6][7].
What did you prompt the LLM with for it to reach this conclusion? The [2][6][7] citations similarly don't seem to explain how that incident from months ago informed the judge's recent decision. Anyway, I'm not saying the conclusion is wrong, I'm saying the article you linked does not support the conclusion.
I think in your rush to reply you may have not read the summarization.
Calm down, cool off, and read it again.
The point is that the circumstances of the incident in 2024 are directly related to the how and why of the NYT lawyers request and the judges order.
The article I linked was to the incident in 2024.
Not everything has to be about pedantry and snark, even on HN.
Edit: I see you edited your response after re-reading the summarization. I’m glad cooler heads have prevailed.
The prompt was simply “What is the relation, if any, between OpenAI being ordered to retain user data and the incident from 2024 where OpenAI accidentally deleted the NYT lawyers data while they were investigating whether OpenAI had used their data to train their models?”
> I see you edited your response after re-reading the summarization.
Just to be clear, the summary is not convincing. I do understand the idea but none of the evidence presented so far suggests that was the reason. The court expected that the data would be retained, the court learned that it was not, the court gave an order for it to be retained. That is the seeming reason for the order.
Put another way: if the incident last year had not happened, the court would still have issued the order currently under discussion.
> It's not an attempt to spin the lawsuit; it's about reassuring their customers.
It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.
It would be extremely unusual (and likely very stupid) for the defendant in a lawsuit to post publicly that the plaintiff maybe has a point.
Why does OpenAI have any obligation to present the NYTs side?
Who said 'obligation'?
It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.
If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.
Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.
To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).
> It's all stolen.
LLMs are not massive archives of data. The big models are a few TB in size. No one is forgoing a NYT subscription because they can ask ChatGPT to print out NYT news stories.
Regardless of the representation, some people are replacing news consumption generally with answers from ChatGPT.
Copyright is pretty narrowly tailored to verbatim reproduction of content so I doubt they will have to pay anything.
Even then, it's possible to prompt the model to exactly reproduce the copyrighted works.
Please show me one of these prompts
NYT has examples in their legal complaint. See page 30.
> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.
I am not an Open AI stan, but this needs to be responded to.
The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.
This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".
Anyone who makes promises about data security is at best incompetent and at worst dishonest.
Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...
> Anyone who makes promises about data security is at best incompetent and at worst dishonest.
Shouldn't that be "at best dishonest and at worst incompetent"?
I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?
An incompetent but honest person is more likely to accept correction and respond to feedback generally.
May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.
I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.
You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy
The OpenAI Privacy Policy specifically allows them to keep data as required by law.
> who don't even care about NYT's content or bypassing their paywalls.
Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.
If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?
> (jeapordizes)
... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.
It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.
In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.
We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.
> I get that approval needs to be given, and that there are barriers to entry.
Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?
OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.
> what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?
Product development?
My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation
> And that you can request 0 days.
Right but the problem they're having is that the request is ignored.
not just money. How are you going to support this client’s support ticket if there is no log at all?
Don't. "We're unable to provide support for your request, because you disabled retention." Easy.
You can still provide support too if you want to. You just need to ask the user what their query was, what response they got, and what response they would be expecting. You can then as the expert either spot their problem immediately, or you can run the query and see for yourself what is going on.
Sure it is a possibility that the ticket will end up closed as “unable to reproduce”, but that is always a possibility. It is not like you have to shut off all support because that might happen.
Plus many support requests are not about the content of the api responses but meta info surrounding them. Support can tell you that you are over the api quota limit even if the content of your prompt was not logged. They can also tell you if your request is missing a required parameter or if they have had 500 errors because of a bad update on their part.
They don't care, they still want support and most leadership teams are unwilling to stand behind a stance of telling customers no.
... but why is not responding to a request for zero retention today better than not being able to respond to a future request? They're basically already saying no to customers who request this capability that they said they support, but their refusal is in the form of never responding.
I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.
>> Does this court order violate GDPR or my rights under European or other privacy laws?
>> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
They didn’t say which law (the US judge’s order or EU law) they are complying with.
"You can also request zero data retention (ZDR) for eligible endpoints if you have a qualifying use-case. For details on data handling, visit our Platform Docs page."
https://openai.com/en-GB/policies/row-privacy-policy/
1. You can request it but there is no promise the request will be granted.
Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.
It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.
> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.
What's the betting that they just write it on the website and never actually implemented it?
Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.
[dead]
I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.
After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.
I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.
[1] https://ssdeep-project.github.io/ssdeep/index.html
[2] https://joshleeb.com/posts/content-defined-chunking.html
I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.
For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.
I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.
The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)
As I looked up the judge for this one(https://en.wikipedia.org/wiki/William_Alsup) who was a hobbyist basic programmer, one would need a judge who coded MNIST as a passtime hobby if that is the case.
a smart judge who is minimally tech savvy could learn to train a model to predict MNIST in a day or two
I thought that's what GPT was for.
"you are a helpful law assistant."
"You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."
Trying to actively circumvent the intention of a judges order is a pretty bad idea.
That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.
Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha
All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.
Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.
Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.
It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.
I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.
> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.
The laws have changed since then and it's not for the better:
https://www.aclu.org/press-releases/congress-passing-bill-th...
Even if the laws give them this power, I believe it would be extremely difficult for an operation like this to go unnoticed (and therefore unreported) at most of these companies. MUSCULAR [1] was able to be pulled off because of the cleartext inter-datacenter traffic which was subsequently encrypted. It's hard to see how they could pull off a similar operation without the cooperation of Google which would also entail a tremendous internal cover up.
Warrantlessly installed backdoors in the log system combined with a gag order, combined with secret courts, all "perfectly legal". Not really hard to imagine.
You would have to gag a huge chunk of the engineers and I just don’t think that would work without leaks. Google’s infrastructure would not make something like that easy to do clandestinely (trying to avoid saying impossible but it gets close).
I was an SRE and SWE on technical infra at Google, specifically the logging infrastructure. I am under no gag order.
> You're pointing to the Russel's Teapot of sigint.
If there were multiple agencies with billion dollar budgets and a belief that they had an absolute national security mandate to get a teapot into solar orbit, and to lie about it, I would believe there was enough porcelain up there to make a second asteroid belt.
> I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.
The input is what's interesting.
It doesn’t change the monumental scope of the problem though.
Though I’m inclined to believe the US gov can if OpenAI can.
Metadata is spying (c) Bruce Schneier
If a CIA spook is stalking you everywhere, documenting your every visible move or interaction, you probably would call that spying. Same applies to digital.
Also, teapot argument can be applied in reverse. We have all these documented open digital network systems everywhere, and you want to say that one the most unprofitable and certainly the most expensive to run system is somehow protecting all user data? That belief is based on what? At least selling data is based on evidence of the industry and on actual ToS'es of other similar corpos.
The comment you replied to isn't saying that metadata isn't spying. It's saying that the spies generally don't have free access to content data.
>However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
Yeah, because the definition of collection was redefined to mean accessing the full content already stored on their systems, post-interception. It wasn't considered collected until an analyst views it. Metadata was a laughable dog and pony show that was part of the same legal shell games at the time, over a decade ago now.
That said, from an outsider's perspective it sounded like the IC did collectively erect robust guard rails such that access to information was generally controlled and audited. I felt like this broke down a bit once sharing 702 data with other federal agencies was expanded around the same time period.
These days, those guard rails might be the only thing standing in the way of democracy as we know it ending in the US. AI processing applied to full-take collection is terrifying, just ask the Chinese.
> However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
This was the point of the lots of the five eyes programs. Its not legal for the US to spy on its own citizens, but it isnt against the law for us to do to the Australians... Who are all to happy to reciprocate.
> Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo...
Snowden's info wasn't really news for many of us who were paying attention in the aftermath of 9/11: https://en.wikipedia.org/wiki/Room_641A (This was huge on slashdot at the time... )
There's no way to know, but it's safer to assume.
My choice conspiracy is that the three letter agencies actively support their omnipresent, omniknowing conspiracies because it ultimately plays into their hand. Sorta like a Santa Claus for citizens.
> because it ultimately plays into their hand.
How? Scared criminals aren't going to make themselves easy to find. Three-letter spooks would almost certainly prefer to smoke-test a docile population than a paranoid one.
In fact, it kinda overwhelmingly seems like the opposite happens. Remember the 2015 San-Bernadino shooting that was pushed into the national news for no reason? Remember how the FBI bloviated about how hard it was to get information from an iPhone, 3 years after Tim Cook's assent to the PRISM program?
Stuff like this is almost certainly theater. If OpenAI perceived retention as a life-or-death issue, they would be screaming about this case from the top of their lungs. If the FBI percieved it as a life-or-death issue, we would never hear about it in our lifetimes. The dramatic and protracted public fights suggest to me that OpenAI simply wants an alibi. Some sort of user-story that smells like secure and private technology, but in actuality is very obviously neither.
Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!
As the linked page points out you can turn this off in settings if you are an end user or choose zero retention if you are an API user.
I mean, they already stole and used all copyrighted material they could find to train the thing, am I supposed to believe that thry wont use my data just because I tick a checkbox?
Agreed, I have hard time believing anything the eye scanning crypto coin (worldcoin or whatever) guy says at this point.
I wish I could test drive your brain to experience a world where one believes that would stop them from stealing your data.
>Of course it's backdoored, you can't even begin to try proving me wrong.
On the contrary.
>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.
I think you're being unduly paranoid. /s
https://www.theverge.com/2024/6/13/24178079/openai-board-pau...
https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...
Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.
I’ve done my part cluttering it with my requests for the same banana bread recipe like 5 separate times.
"We kill people based on metadata." - National Security Agency Gen. Michael Hayden
Raw data with time-series significance is their absolute favorite. You might argue something like Google Maps data is "obfuscated by virtue of its banality" until you catch the right person in the wrong place. ChatGPT sessions are the same way, and it's going to be fed into aggregate surveillance systems in the way modern telecom and advertiser data is.
This is mostly security theater, and generally not worth the lift when you consider the steps needed to unlock the value of that data in the context of investigations.
-The Privacy and Civil Liberties Oversight Board’s 2014 review of the NSA “Section 215” phone-record program found no instance in which the dragnet produced a counter-terror lead that couldn’t have been obtained with targeted subpoenas. https://en.m.wikipedia.org/wiki/Privacy_and_Civil_Liberties_...
-After Boston, Paris, Manchester, and other attacks, post-mortems showed the perpetrators were already in government databases. Analysts simply didn’t connect the dots amid the flood of benign hits. https://www.newyorker.com/magazine/2015/01/26/whole-haystack
-Independent tallies suggest dozens of civilians killed for every intended high-value target in Yemen and Pakistan, largely because metadata mis-identifies phones that change pockets. https://committees.parliament.uk/writtenevidence/36962/pdf
Search engines have been doing this since the mid 90s and have only improved, to think that any data is obfuscated by its being part of some huge volume of other data is a fallacy at best.
Search engines use our data for completely different purposes.
That doesn’t negate the GPs point. It’s easy to make datasets searchable.
Searchable? You have to know what to search for, and you have to rule out false positives. How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something? That's not something a search function can distinguish. It requires a human to sift through that data.
> How do you discern a person roleplaying some secret agent scenario vs. a person actually plotting something?
Meta data and investigation.
> That's not something a search function can distinguish.
We know that it can narrow down hugely from the initial volume.
> It requires a human to sift through that data.
Yes, the point of collating, analysing, and searching data is not to make final judgements but to find targets for investigation by the available agents. That's the same reason we all use search engines, to narrow down, they never produce what we intend by intention alone, we still have to read the final results. Magic is still some way off.
You're acting as if we can automate humans out of the loop entirely, which would be a straw man. Is anyone saying we can get rid of the police or security agencies by using AI? Or perhaps AI will become the police, perhaps it will conduct traffic stops using driverless cars and robots? I suppose it could happen, though I'm not sure what the relevance would be here.
The data is obfuscated and the cost to unlock the value of it is often not worth the effort.
And yet billions of dollars (at least) has gone into it. A whole group of people with access to the data and the means to sift it disagree and are willing to put their money behind it, so your bare assertions count for nowt.
Great. What do you think that proves? That doesn't negate my inital argument. The data is largely useless, and often counterproductive. The evidence shows the vast majority of plots are foiled through conventional means, and ruling out false positives is more trouble than it's worth. I cited sources in this thread. Where are your sources?
"Corporations and the US government are spending money on it, so it must be useful." Are you serious? Lmao.