Hacker News

How scientists are using Claude to accelerate research and discovery

2026-01-183:2212475www.anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Show article

Last October we launched Claude for Life Sciences—a suite of connectors and skills that made Claude a better scientific collaborator. Since then, we've invested heavily in making Claude the most capable model for scientific work, with Opus 4.5 showing significant improvements in figure interpretation, computational biology, and protein understanding benchmarks. These advances, informed by our partnerships with researchers in academia and industry, reflect our commitment to understanding exactly how scientists are using AI to accelerate progress.

We’ve also been working closely with scientists through our AI for Science program, which provides free API credits to leading researchers working on high-impact scientific projects around the world.

These researchers have developed custom systems that use Claude in ways that go far beyond tasks like literature reviews or coding assistance. In the labs we spoke to, Claude is a collaborator that works across all stages of the research process: making it easier and more cost-effective to understand which experiments to run, using a variety of tools to help compress projects that normally take months into hours, and finding patterns in massive datasets that humans might overlook. In many cases it’s eliminating bottlenecks, handling tasks that require deep knowledge and have previously been impossible to scale; in some it’s enabling entirely different research approaches than researchers have traditionally been able to take.

In other words, Claude is beginning to reshape how these scientists work—and point them towards novel scientific insights and discoveries.

One bottleneck in biological research is the fragmentation of tools: there are hundreds of databases, software packages, and protocols available, and researchers spend substantial time selecting from and mastering various platforms. That’s time that, in a perfect world, would be spent on running experiments, interpreting data, or pursuing new projects.

Biomni, an agentic AI platform from Stanford University, collects hundreds of tools, packages, and data-sets into a single system through which a Claude-powered agent can navigate. Researchers give it requests in plain English; Biomni automatically selects the appropriate resources. It can form hypotheses, design experimental protocols, and perform analyses across more than 25 biological subfields.

Consider the example of a genome-wide association study (GWAS), a search for genetic variants linked to some trait or disease. Perfect pitch, for instance, has a strong genetic basis. Researchers would take a very large group of people—some who are able to produce a musical note without any reference tone, and others you would never invite to karaoke—and scan their genomes for genetic variants that show up more often in one group than another.

The genome scanning is (relatively) simple. It’s the process of analyzing and making sense of the data that’s time-consuming: genomic data comes in messy formats and needs extensive cleaning; researchers must control for confounding and deal with missing data; once they identify any “hits,” they need to figure out what they actually mean—what gene is nearby (since GWAS only points to locations in a genome), what cell types it’s expressed in, what biological pathway it might affect, and so on. Each step might involve different tools, different file formats, and a lot of manual decision-making. It’s a tedious process. A single GWAS can take months. But in an early trial of Biomni, it took 20 minutes.

This might sound too good to be true—can we be sure of the accuracy of this kind of AI analysis? The Biomni team has validated the system through several case studies in different fields. In one, Biomni designed a molecular cloning experiment; in a blind evaluation, the protocol and design matched that of a postdoc with more than five years of experience. In another, Biomni analyzed the data from over 450 wearable data files from 30 different people (a mix of continuous glucose monitoring, temperature, and physical activity) in just 35 minutes—a task estimated to take a human expert three weeks. In a third, Biomni analyzed gene activity data from over 336,000 individual cells taken from human embryonic tissue. The system confirmed regulatory relationships scientists already knew about, but also identified new transcription factors—proteins that control when genes turn on and off—that researchers hadn’t previously connected to human embryonic development.

Biomni isn’t a perfect system, which is why it includes guardrails to detect if Claude has gone off-track. Nor can it yet do everything out of the box. However, where it comes up short, experts can encode their methodology as a skill—teaching the agent how an expert might approach a problem, rather than letting it improvise. For example, when working with the Undiagnosed Diseases Network on rare disease diagnosis, the team found that Claude's default approach differed substantially from what a clinician would do. So they interviewed an expert, documented their diagnostic process step by step, and taught it to Claude. With that new, previously-tacit knowledge, the agent performed well.

Biomni represents one approach: a general-purpose system that brings hundreds of tools under one roof. But other labs are building more specialized systems—targeting specific bottlenecks in their own research workflows.

Cheeseman Lab: automating the interpretation of large-scale gene knockout experiments

When scientists want to understand what a gene does, one approach is to remove it from the cell or organism in question and see what breaks. The gene-editing tool CRISPR, which emerged around 2012, made this easy to do precisely at scale. But the utility of CRISPR was still limited: labs could generate far more data than they had the bandwidth to analyze.

This is exactly the challenge faced by Iain Cheeseman’s lab at the Whitehead Institute and Department of Biology at MIT. Using CRISPR, they knock out thousands of different genes across tens of millions of human cells, then photograph each cell to see what changed. The patterns in those images reveal that genes that do similar jobs tend to produce similar-looking damage when removed. Software can detect these patterns and group genes together automatically—Cheeseman's lab built a pipeline to do exactly this called Brieflow (yes, brie the cheese).

But interpreting what these gene groupings mean—why the genes cluster together, what they might have in common, whether it’s a known biological relationship or something new—still requires a human expert to comb through the scientific literature, gene by gene. It’s slow. A single screen can produce hundreds of clusters, and most never get investigated simply because labs don’t have the time, bandwidth, or in-depth knowledge about the diverse things that cells do.

For years, Cheeseman did all the interpretation himself. He estimates he can recall the function of about 5,000 genes off the top of his head, but it still takes hundreds of hours to analyze this data effectively. To accelerate this process, PhD student Matteo Di Bernardo sought to build a system that would automate Cheeseman’s approach. Working closely with Cheeseman to understand exactly how he approaches interpretation—what data sources he consults, what patterns he looks for, what makes a finding interesting—they built a Claude-powered system called MozzareLLM (you might be seeing a theme developing here).

It takes a cluster of genes and does what an expert like Cheeseman would do: identifies what biological process they might share, flags which genes are well-understood versus poorly studied, and highlights which ones might be worth following up on. Not only does this substantially accelerate their work, but it is also helping them make important additional biological discoveries. Cheeseman finds Claude consistently catches things he missed. “Every time I go through I’m like, I didn’t notice that one! And in each case, these are discoveries that we can understand and verify,” he says.

What helps make MozzareLLM so useful is that it isn’t a one-trick pony: it can incorporate diverse information and reason like a scientist. Most notably, it provides confidence levels in its findings, which Cheeseman emphasizes is crucial. It helps him decide whether or not to invest more resources in following up on its conclusions.

In building MozzareLLM, Di Bernardo tested multiple AI models. Claude outperformed the alternatives—in one case correctly identifying an RNA modification pathway that other models dismissed as random noise.

Cheeseman and Di Bernardo envision making these Claude-annotated datasets public—letting experts in other fields follow up on clusters his lab doesn't have time to pursue. A mitochondrial biologist, for instance, could dive into mitochondrial clusters that Cheeseman's team has flagged but never investigated. As other labs adopt MozzareLLM for their own CRISPR experiments, it could accelerate the interpretation and validation of genes whose functions have remained uncharacterized for years.

Lundberg Lab: testing AI-led hypothesis generation for which genes to study

The Cheeseman lab uses optical pooled screening—a technique that lets them knock out thousands of genes in a single experiment. Their bottleneck is interpretation. But not every cell type works with pooled approaches. Some labs, such as the Lundberg Lab at Stanford, run smaller, focused screens, and their bottleneck comes earlier: deciding which genes to target in the first place.

Because a single focused screen can cost upwards of $20,000 and costs increase with size, labs typically target a few hundred genes they think are most likely to be involved in a given condition. The conventional process involves a team of grad students and postdocs sitting around a Google spreadsheet, adding candidate genes one by one with a sentence of justification, or maybe a link to a paper. It's an educated guessing game, informed by literature reviews, expertise, and intuition, but constrained by human bandwidth. It’s also fallible, based as it is on what other scientists already figured out and written down, and what the humans in the room happen to recall.

The Lundberg Lab is using Claude to flip that approach. Instead of asking “what guesses can we make based on what researchers have already studied?”, their system asks “what should be studied, based on molecular properties?”

The team built a map of every known molecule in the cell—proteins, RNA, DNA—and how they relate to each other. They mapped out which proteins bind together, which genes code for which products, and which molecules are structurally similar. They can then give Claude a target—for instance which genes might govern a particular cellular structure or process—and Claude navigates that map to identify candidate genes based on their biological properties and relationships.

The Lundberg lab is currently running an experiment to study how well this approach works. To do so, they needed to identify a topic where very little research had been done (if they’d looked at something well-studied, Claude might already know about the established findings). They chose primary cilia: antenna-like appendages on cells that we still know little about but which are implicated in a variety of developmental and neurological disorders. Next, they’ll run a whole genome screen to see which genes actually affect cilia formation, and establish the ground-truth.

The test is to compare human experts to Claude. The humans will use the spreadsheet approach to make their guesses. Claude will generate its own using the molecular relationship map. If Claude catches (hypothetically) 150 out of 200, and the humans catch 80 out of 200, that's proof the approach works better. Even if they're about equal in discovering the genes, it’s still likely Claude works much faster, and could make the whole research process more efficient.

If the approach works, the team envisions it becoming a standard first step in focused perturbation screening. Instead of gambling on intuition or using brute-force approaches that have become prevalent in contemporary research, labs could make informed bets about which genes to target—getting better results without needing the infrastructure for whole-genome screening.

Looking forward

None of these systems are perfect. But they point to the ways that in just a few short years scientists have begun to incorporate AI as a research partner capable of far more than basic tasks—indeed, increasingly able to speed up, and in some cases even replace, many different aspects of the research process.

In speaking with these labs, a common theme emerged: the usefulness of the tools they’ve built continues to grow in concert with AI capabilities. Each model release brings noticeable improvements. Where just two years ago earlier models were limited to writing code or summarizing papers, more powerful agents have begun, if slowly, to replicate the very work those papers describe.

As tools advance and AI models continue to grow more intelligent, we’re continuing to watch and learn from how scientific discovery develops along with them.

For more detail on the expanded Claude for Life Sciences capabilities, see here, and our tutorials here. We’re also continuing to accept applications to our AI for Science program. Applications will be reviewed by our team, including subject matter experts in relevant fields.

Read the original article

gmays

Karma: 40235

@Hacker__News
@hacker._news

Comments

By jadenpeterson 2026-01-185:167 reply

Not to be a luddite, but large language models are fundamentally not meant for tasks of this nature. And listen to this:

> Most notably, it provides confidence levels in its findings, which Cheeseman emphasizes is crucial.

These 'confidence levels' are suspect. You can ask Claude today, "What is your confidence in __" and it will, unsurprisingly, give a 'confidence interval'. I'd like to better understand the system implemented by Cheeseman. Otherwise I find the whole thing, heh, cheesy!

By isoprophlex 2026-01-187:244 reply

I've spent the last ~9 months building a system that, amongst other things, uses a vLLM to classify and describe >40 million house images of number signs in all of Italy. I wish I was joking, but that aside.

When asked about their confidence, these things are almost entirely useless. If the Magic Disruption Box is incapabele of knowing whether or not it read "42/A" correctly, I'm not convinced it's gonna revolutionize science by doing autonomous research.

By bob1029 2026-01-188:041 reply

How exactly are we asking for the confidence level?

If you give the model the image and a prior prediction, what can it tell you? Asking for it to produce a 1-10 figure in the same token stream as the actual task seems like a flawed strategy.

By kelipso 2026-01-1815:512 reply

I’m not saying the LLM will give a good confidence value, maybe it will maybe it won’t, it would depend on its training, but why is making it produce the confidence value in the same token stream as the actual task a flawed strategy?

That’s how typical classification and detection CNNs work. Class and confidence value along with bounding box for detection CNNs.

By hexaga 2026-01-1818:491 reply

Because it's not calibrated to. In LLMs, next token probabilities are calibrated: the training loss drives it to be accurate. Likewise in typical classification models for images or w/e else. It's not beyond possibility to train a model to give confidence values.

But the second-order 'confidence as a symbolic sequence in the stream' is only (very) vaguely tied to this. Numbers-as-symbols are of different kind to numbers-as-next-token-probabilities. I don't doubt there is _some_ relation, but it's too much inferential distance away and thus worth almost nothing.

With that said, nothing really stops you from finetuning an LLM to produce accurately calibrated confidence values as symbols in the token stream. But you have to actually do that, it doesn't come for free by default.

By kelipso 2026-01-190:46

Yeah, I agree you should be able to train it to output confidence values, especially integers from 0 to 9 for confidence should make it so it won’t be as confused.

By bob1029 2026-01-1817:151 reply

CNNs and LLMs are fundamentally different architectures. LLMs do not operate on images directly. They need to be transformed into something that can ultimately be fed in as tokens. The ability to produce a confidence figure isn't possible until we've reached the end of the pipeline and the vision encoder has already done its job.

By kelipso 2026-01-190:44

The images get converted to tokens using the vision encoder, But the tokens are just embedding vectors. So it should be able to if you train it.

CNNs and LLMs are not that different. You can train an LLM architecture to do the same thing that CNNs do with a few modifications, see Vision Transformers.

By anal_reactor 2026-01-1811:321 reply

> If the Magic Disruption Box is incapabele of knowing whether or not it read "42/A" correctly

Are you implying that science done by humans is entirely error-free?

By mxkopy 2026-01-1813:431 reply

There exists human research that is worse than AI slop. There is no AI research worthy of the Nobel prize

By anal_reactor 2026-01-1816:40

yet.

By ben_w 2026-01-1812:03

Yes and no at the same time, depending on what you intend to get from asking. I don't know what you were doing with this project, obviously, so I don't speak to that, but science (well, stats in general, but science needs stats) has a huge dependency on being sure the question was the correct one and not just rhyming.

Reading hand-written digits was the 'hello world' of AI well before LLMs came along. I know, because I did it well before LLMs came along.

Obviously a simple model itself can't know if it's right or wrong, as per one of Wittgenstein's quote:

  If there were a verb meaning 'to believe falsely', it would not have any significant first person, present indicative.

That said, IMO not (as Wittgenstein seemed to have been claiming) impossible, as at the very least human brains are not single monolithic slabs of logic: https://www.lesswrong.com/posts/CFbStXa6Azbh3z9gq/wittgenste...

In the case of software, whatever system surrounds this unit of machine classification (be it scripts or more ML) can know how accurately this unit classifies things in certain conditions. My own MNIST-hello-world example, split the test set and training set, the test set tells you (roughly!) how good the training was: while this still won't tell you if any given answer is wrong, it will tell you how many of those 40 million is probably wrong.

Humans and complex AI can, in principle, know their own uncertainty, e.g. I currently estimate my knowledge of physics to be around the level of a first year undergraduate course student, because I have looked at what gets studied in the first year and some past paers and most of it is not surprising (just don't ask me which one is a kaon and which one is a pion).

Unfortunately "capable" doesn't mean "good", and indeed humans are also pretty bad at this, the general example is Dunning Kruger, and my personal experience of that from the inside is that I've spent the last 7.5 years living in Germany, and at all points I've been sure (with evidence, even!) that my German is around B1 level, and yet it has also been the case that with each passing year my grasp of the language has improved, so what I'm really sure of is that I was wrong 7 years ago, but I don't know if I still am or not, and will only find out at the end of next month when I get the results of an exam I have yet to sit.

By Yajirobe 2026-01-188:381 reply

A blind mathematician can do revolutionary work despite not being able to see

By troupo 2026-01-189:311 reply

Here's a logical step you skipped: A blind matematician can do revolutionary work in mathematics. He is highly unlikely to do revolutionary work in agriculture.

By toss1 2026-01-1814:49

Interesting example, as there was an article on HN front page 10 days ago about exactly that - a blind person doing revolutionary work in agriculture. [0][1]

[0] https://www.bbc.com/news/articles/c4g4zlyqnr0o — "I used Lego to design a farm for people who are blind - like me"

[1] https://news.ycombinator.com/item?id=46502269

By red75prime 2026-01-186:201 reply

> large language models are fundamentally not meant for tasks of this nature

There should be some research results showing their fundamental limitations. As opposed to empirical observations. Can you point at them?

What about VLMs, VLAs, LMMs?

By utopiah 2026-01-186:371 reply

Old "agged Technological Frontier" but explains a bit the challenge https://www.hbs.edu/faculty/Pages/item.aspx?num=64700 namely... it's hard and the lack of reproducibility (models getting inaccessible to researcher quickly) makes this kind of studies very challenging.

By red75prime 2026-01-186:54

That is an old empirical study. jadenpeterson was talking about some fundamental limitations of LLMs.

By post_below 2026-01-187:023 reply

Finding patterns in large datasets is one of the things LLMs are really good at. Genetics is an area where scientists have already done impressive things with LLMs.

However you feel about LLMs, and I say this because you don't have to use them for very long before you witness how useful they can be for large datasets so I'm guessing you're not a fan, they are undeniably incredible tools in some areas of science.

https://news.stanford.edu/stories/2025/02/generative-ai-tool...

https://www.nature.com/articles/s41562-024-02046-9

By catlifeonmars 2026-01-187:431 reply

In reference to the second article: who cares? What we care about is experimental verification. I could see maybe accurate prediction being helpful in focusing funding, but you still gotta do the experimentation.

Not disagreeing with your initial statement about LLMs being good and finding patterns in datasets btw.

By wasabi991011 2026-01-1818:091 reply

This is also true of lots of human research, there's always a theory side of research that guides the experimental side. Even if just informal, experimental researchers have priors for what experimental verification they should attempt.

By kelipso 2026-01-191:021 reply

Yeah, there’s an infinite numbers of experiments you could run but obviously infinite resources don’t exist, so you need theory to guide where to look. For example, computational methods in bioinformatics to guess a protein function so that experimental researchers can verify the protein function (which takes weeks to months for a given protein function hypothesis) is an entire field.

By catlifeonmars 2026-01-195:57

You need to search in both likely and unlikely places. This is pretty common in high dimensional search spaces. Searching only in the most likely places gets you stuck in local minima

By refurb 2026-01-187:15

As a scientist, the two links you provided are severely lacking in utility.

The first developed a model to calculate protein function based on DNA sequence - yet provides no results of testing of the model. Until it does, it’s no better than the hundreds of predictive models thrown on the trash heap of science.

The second tested a models “ability to predict neuroscience results” (which reads really oddly). How did they test it? Pitted humans against LLMs in determining which published abstracts were correct.

Well yeah? That’s exactly what LLMs are good at - predicting language. But science is not advanced by predicting which abstracts of known science are correct.

It reminds me of my days in working with computational chemists - we had an x-ray structure of the molecule bound to the target. You can’t get much better than that at hard, objective data.

“Oh yeah, if you just add a methyl group here you’ll improve binding by an order of magnitude”.

So we went back to the lab, spent a week synthesizing the molecule, sent it to the biologists for a binding study. And the new molecule was 50% worse at binding.

And that’s not to blame the computation chemist. Biology is really damn hard. Scientists are constantly being surprised at results that are contradictory to current knowledge.

Could LLMs be used in the future to help come up with broad hypotheses in new areas? Sure! Are the hypotheses going to prove fruitless most of the time? Yes! But that’s science.

But any claim of a massive leap in scientific productivity (whether LLMs or something else) should be taken with a grain of salt.

By troupo 2026-01-187:141 reply

> Finding patterns in large datasets is one of the things LLMs are really good at.

Where by "good at" you mean "are totally shit at"?

They routinely hallucinate things even on tiny datasets like codebases.

By post_below 2026-01-187:371 reply

I don't follow the logic that "it hallucinates so it's useless". In the context of codebases I know for sure that they can be useful. Large datasets too. Are they also really bad at some aspects of dealing with both? Absolutely. Dangerously, humorously bad sometimes.

But the latter doesn't invalidate the former.

By troupo 2026-01-189:23

> I don't follow the logic that "it hallucinates so it's useless".

I... don't even know how to respond to that.

Also. I didn't say they were useless. Please re-read the claim I responded to.

> Are they also really bad at some aspects of dealing with both? Absolutely. Dangerously, humorously bad sometimes.

Indeed.

Now combine "Finding patterns in large datasets is one of the things LLMs are really good at." with "they hallucinate even on small datasets" and "Are they also really bad at some aspects of dealing with both? Absolutely. Dangerously, humorously bad sometimes"

Translation, in case logic somehow eludes you: if an LLM finds a pattern in a large dataset given that it often hallucinates, dangerously, humorously bad, what are the chances that the pattern it found isn't a hallucination (often subtle one)?

Especially given the undeniable verifiable fact that LLMs are shit at working with large datasets (unless they are explicitly trained on them, but then it still doesn't remove the problem of hallucinations)

By eurekin 2026-01-188:15

I made a toy order item cost extractor out of my pile of emails. Claude added confidence percentage tracking and it couldn't be more useless.

By vimda 2026-01-186:191 reply

This is what Yan Le Cun means when he talks about how research is at a dead end at the moment with everyone all in on LLMs to a fault

By agumonkey 2026-01-187:34

I'm just a noob but lecun seems obsessed with the idea of world models, which I assume means a more rigorous physical approach, and I don't understand (again, confused noob here) how are t would help precise abstract thinking.

By 3836293648 2026-01-198:17

LLMs do typically encode a confidence level in their embeddings, they just never use it when asked. There were multiple papers on this a few years back and they got reasonable results out of it. I think it was in the GPT3.5 era though

By djtango 2026-01-185:311 reply

Can't LLMs be fed the entire corpus of literature to synthesise (if not "insight") useful intersections? Not to mention much better search than what was available when I was a lowly grad...

By fatherwavelet 2026-01-1813:041 reply

I use Gemini almost obsessively but I don't think feeding the entire corpus of a subject would work great.

The problem is so much of consensus is wrong and it is going to start by giving you the consensus answer on anything.

There are subjects I can get it to tell me the consensus answer then say "what about x" and it completely changes and contradicts the first answer because x contradicts the standard consensus orthodoxy.

To me it is not much different than going to the library to research something. The library is not useless because the books don't read themselves or because there are numerous books on a subject that contradict each other. Gaining insight from reading the book is my role.

I suspect much LLM criticism is from people who neither much use LLMs nor learn much of anything new anyway.

By djtango 2026-01-1814:41

I never suggested I want an LLM to be the definitive answer to a question but I'm certain that there are a lot of low hanging fruit across disciplines where the limit is the awareness of people in one field of the work of another field, and the limiting factor was the friction in discovery - I can't see how a specialised research tool powered by LLMs and RAG wouldn't be a net gain for research if only to generate promising new leads.

Throwing compute to mine a search space seems like one of the less controversial ways to use technology...

By alsetmusic 2026-01-184:463 reply

Call me when a disinterested third-party says so. PR announcements by the very people who have a large stake in our belief in their product are unreliable.

By joshribakoff 2026-01-185:511 reply

This company predicts software development is a dead occupation yet ships a mobile chat UI that appears to be perpetually full of bugs, and has had a number of high profile incidents.

By simonw 2026-01-186:172 reply

"This company predicts software development is a dead occupation"

Citation needed?

Closest I've seen to that was Dario saying AI would write 90% of the code, but that's very different from declaring the death of software development as an occupation.

By falloutx 2026-01-189:251 reply

The clear disdain he has for the profession is evident in any interview he gives. Him saying 90% of the code was not a signal to us, but it was directed to his fellow execs, that they can soon get rid of 90% of the engineers and some other related professions.

By throw234234234 2026-01-195:28

I think it's pretty clear that Anthrophic was the main AI lab pushing code automation right from the start. Their blog posts, everything just targeted code generation. Even their headings for new models in articules would be "code". My view if they weren't around, even if it would of happened eventually, code would of been solved with cadence to other use cases (i.e. gradually as per general demand).

AI Engineers aren't actually SWE's per se; they use code but they see it as tedious non-main work IMO. They are happy to automate their compliment and raise in status vs SWE's who typically before all of this had more employment opportunities and more practical ways to show value.

By throw310822 2026-01-1812:21

AI is already writing 90% of my code. 100% of Claude Code's code, too. So Amodei was right.

By NewsaHackO 2026-01-185:143 reply

Is your argument that the quotes by the researchers in the article are not real?

By taormina 2026-01-185:211 reply

What quotes? This is an AI summary that may or may not have summarized actual quotes from the researchers, but I don't see a single quote in this article, or a source.

By famouswaffles 2026-01-185:351 reply

Why are you commenting if you can't even take a few minutes to read this ? It's quite bizarre. There's a quote and repo for Cheeseman, and a paper for Biomni.

By WD-42 2026-01-185:441 reply

There is only one quote in the entire article, though:

> Cheeseman finds Claude consistently catches things he missed. “Every time I go through I’m like, I didn’t notice that one! And in each case, these are discoveries that we can understand and verify,” he says.

Pretty vague and not really quantifiable. You would think an article making a bold claim would contain more than a single, hand-wavy quote from an actual scientist.

By famouswaffles 2026-01-186:071 reply

>Pretty vague and not really quantifiable. You would think an article making a bold claim would contain more than a single, hand-wavy quote from an actual scientist.

Why? What purpose would quotes serve better than a paper with numbers and code? Just seems like nitpicking here. The article could have gone without a single quote (or had several more) and it wouldn't really change anything. And that quote is not really vague in the context of the article.

By inferiorhuman 2026-01-186:21

[flagged]

By catlifeonmars 2026-01-187:54

The point is to look at who is making a claim and asking what they hope to gain from it. This is orthogonal to what the thing is, really. It’s just basic skepticism.

Even if the article is accurate, it still makes sense to question the motives of the publisher. Especially if they’re selling a product.

By inferiorhuman 2026-01-185:392 reply

[flagged]

By simonw 2026-01-186:19

Most people aren't software developers. The HN audience can benefit from LLMs in ways that many people don't value.

By bpodgursky 2026-01-185:454 reply

Are you accusing Anthropic of hallucinating an MIT lab under the MIT domain? I mean they literally link to it https://cheesemanlab.wi.mit.edu/

By falloutx 2026-01-189:30

And if you go to that site, the researchers say nothing about using Claude, or any LLMs for that matter.

By NewsaHackO 2026-01-185:51

Honestly, it doesn't even seem they read the article, just came in, saw it was pro-AI, and commented.

By famouswaffles 2026-01-185:47

You know things have shifted a gear when people just start flat out denying reality.

By inferiorhuman 2026-01-185:50

[flagged]

By signatoremo 2026-01-1818:29

> Call me when a disinterested third-party says Call me when a disinterested third-party says so

Saying what? This describes three projects that use an Anthropic’s product. Do you need a third party to confirm that? Or do you need someone to tell you if they are legit?

There are hundreds of announcements by vendors ơn HN. Did you object to them all or only when your own belief is against them?

By falloutx 2026-01-189:22

Of course this comes from Anthropic PR. Stanford basically has a stake in making LLMs and AI hype so no wonder they are the most receptive.