Why RDF is the natural knowledge layer for AI systems

2025-09-055:396868demo.bryon.io

Part 1 of 6 in the series “LLMs Need Knowledge Graphs. Use RDF or End Up Rebuilding It.”

Bryon Jacob
Press enter or click to view image in full size

Part 1 of 6 in the series “LLMs Need Knowledge Graphs. Use RDF or End Up Rebuilding It.”

The Big Picture: Knowledge graphs triple LLM accuracy on enterprise data. But here’s what nobody tells you upfront: every knowledge graph converges on the same patterns, the same solutions. This series reveals why RDF isn’t just one option among many — it’s the natural endpoint of knowledge representation. By Post 6, you’ll see real enterprises learning this lesson at great cost — or great savings.

The Knowledge Layer Revolution

Your AI is struggling with your data. You know this because you’ve watched it happen-confident answers that are completely wrong, hallucinations about basic facts, inability to connect information from different systems.

You’re not alone. When large language models try to answer business questions using enterprise SQL databases, errors are common. Without additional context and structure, LLMs often struggle to interpret schemas and relationships correctly.

But something remarkable happens when you add a knowledge layer between your data and your AI. When that same data is transformed into a knowledge graph, accuracy more than triples. The improvement is dramatic.

This finding comes from research my colleagues (Juan Sequeda and Dean Allemang) and I published together (“Benchmarking the Abilities of LLMs for Supporting Enterprise Knowledge Graph Construction from Relational Databases”, 2023). We discovered that LLMs perform dramatically better with knowledge graphs — the structure aligns naturally with how they process information.

When teams embark on building a knowledge layer, they face a critical early decision: use the established RDF standards, or build something custom. Many choose to build their own solution, viewing RDF as overly complex or academic. They start with property graphs, custom schemas, or proprietary platforms that promise quick wins.

But I’ve spent years working at the intersection of knowledge representation and AI, watching these projects evolve. The pattern is remarkably consistent. Teams that choose not to use RDF inevitably find themselves rebuilding its core features: global identifiers for entities, protocols for data federation, ways to express relationships and metadata consistently. What starts as “we’ll keep it simple” becomes “we need a canonical ID system” becomes “we’re building our own semantic layer.”

Uber discovered this after building their own graph system. Neo4j reversed course after years of positioning against RDF. The market has spoken: you need these capabilities. The only question is whether you’ll build them yourself or use what already exists.

This series reveals why RDF isn’t just another technology choice — it’s the natural endpoint of knowledge representation. Not because of ideology or standards bodies, but because the problems of representing knowledge at scale force convergent evolution.

Let me show you why, starting with the most fundamental challenge every knowledge layer must solve.

Why LLMs Struggle with Traditional Databases

LLMs are pattern-matching machines trained on natural language. When they encounter a SQL schema, they’re forced to:

  • Guess what cust_id vs customer_id vs custID mean
  • Infer relationships from cryptic foreign key names
  • Navigate ambiguous table names (is orders for customer orders or supply orders?)
  • Understand domain-specific abbreviations without context

The result is poor performance — not because LLMs are bad at reasoning, but because SQL schemas optimize for storage efficiency rather than semantic clarity.

You can improve SQL schemas for semantic clarity — using descriptive names, normalizing relationships properly, maintaining clean metadata. But this requires constant discipline, adds significant overhead, and fights against SQL’s natural optimization patterns. Database administrators rightfully focus on performance and maintainability, leading to denormalization, cryptic but efficient column names, and other practices that prioritize machine efficiency over semantic clarity. Even with perfect discipline, SQL’s fundamental separation of data (in tables) from metadata (in schemas) makes it harder for AI systems to understand how the model evolves. When your knowledge representation is spread across DDL statements, foreign key constraints, and actual data, LLMs struggle to build a coherent semantic picture.

Knowledge graphs, on the other hand, are organized the way we actually think about facts and relationships. They represent knowledge directly, not as a “projection” into tables and columns. While you can store facts in relational databases, you’re always forcing a graph-shaped understanding into a table-shaped container.

## The Pattern Every Enterprise Follows When Building a Knowledge Graph

Watch for this progression in your organization:

  1. “We need a knowledge graph for our AI”
  2. “RDF seems too complex, let’s use property graphs”
  3. “We need global identifiers for our merger”
  4. “How do we federate queries across departments?”
  5. “Our custom solution is becoming unmaintainable”
  6. “Maybe we should have used RDF from the start”

This series will show you why this pattern is inevitable — and how to skip to the end.

Why Knowledge Graphs Change Everything

Knowledge graphs represent information the way LLMs (and humans) “think”:

  • Explicit relationships: No guessing what foreign keys mean
  • Rich context: Every entity and relationship can be described
  • Natural language alignment: Triples mirror subject-verb-object sentences
  • Semantic clarity: Types, hierarchies, and constraints are explicit

As Dan Bennett explains in his excellent primer on knowledge graphs, “We can state anything about anything using this model” — and crucially, “A single row is meaningful. It contains a single fact.” This isn’t just a technical preference — it’s about fundamental representation. Knowledge graphs store the atomic truths about your business directly, while relational databases require reconstructing those truths from scattered pieces. When an LLM can traverse relationships explicitly rather than inferring them from column names, accuracy triples. The knowledge graph becomes a bridge between human meaning and machine processing.

The Knowledge Graph Gold Rush… and Its Hidden Challenge

The 3x accuracy improvement has triggered a gold rush. Enterprises are racing to build knowledge graphs. But here’s what the research papers don’t always mention: building a production knowledge graph requires solving fundamental problems that have existed since humans started organizing information.

And this is where our story really begins.

The First Problem: Identity

Knowledge graphs must answer a deceptively simple question: “How do we know two things are the same thing?”

It starts innocently enough. Customer #12345 in your sales system needs to match up with cust_12345 in your support system. But then it gets messier:

  • When an LLM sees “Apple” in your data, is it the fruit or the company?
  • Is employee “A. Johnson” the same as “Alice Johnson” in HR?
  • When you reference Database → Schema → Table → Column, which specific column across all your systems?

Without solving identity, you get:

  • Data silos that refuse to talk to each other
  • Integration projects that never truly end
  • LLMs hallucinating because they can’t distinguish between entities

Every graph database, every knowledge graph platform, every enterprise data mesh must solve this. And RDF solved it 25 years ago by building on the architecture of the most successful distributed system ever created — the World Wide Web.

Enter IRIs: The Web’s Gift to Data

The solution has been staring us in the face since the invention of the web itself: International Resource Identifiers (IRIs). Just as URLs gave us a way to uniquely identify any document on the web, IRIs give us a way to uniquely identify anything at all.

Here’s what this looks like in practice:

# IRIs provide globally unique identifierstc:employee-alice-johnson a :Employee ; :name "Alice Johnson" ; :employeeId "E12345" .# Different system, same person - unified by IRIdir:staff-ajohnson

owl:sameAs tc:employee-alice-johnson .

Notice how this reads almost like English sentences? That’s not an accident-RDF’s triple structure mirrors how we naturally express facts.

The keen-eyed reader might notice these identifiers don’t look like typical URLs. We’re using prefixed names (like tc:employee-alice-johnson) that expand to full IRIs (like <http://timecard.example.com/employee-alice-johnson>). Think of it like using domain names instead of IP addresses—both point to the same place, but one is much easier for humans to work with.

The magic isn’t in the syntax-it’s in the properties:

Global Uniqueness: By using domain-based namespacing, collisions become virtually impossible. Your customer #12345 at data.example.com will never be confused with someone else's customer #12345.

Dereferenceable: IRIs can be designed to return more information when accessed, following web architecture principles. While not automatic, making your IRIs dereferenceable is a semantic web best practice that elegantly bridges your knowledge graph with the existing infrastructure of the web. Just as clicking a link can take you to a webpage, systems can potentially follow well-designed IRIs to discover more context.

Hierarchical: IRIs naturally organize into hierarchies (/customer/12345/orders/...). These structured IRIs are invaluable for humans (and AI!) to quickly understand what they represent. But-and this is crucial-you should never parse them programmatically. The hierarchical structure is a scheme for generating meaningful identifiers and making them readable, but machines should treat them as opaque strings.

International: Unlike traditional URIs, IRIs support the full range of Unicode characters. Your customers in Tokyo, Moscow, and Cairo can all have identifiers in their own scripts.

The Build-vs-Buy Moment Every Enterprise Faces

At this point, you might be thinking: “We don’t need all this. We’ll just build a simple mapping table.”

Let me save you three years and several million dollars. Here’s how it actually plays out:

Year 1: “We’ll just map customer IDs between systems” ($500K, 2 engineers)

  • Build a mapping table
  • Works great for 2–3 systems
  • The solution seems complete

Year 2: “We need to handle entities beyond customers” ($2M, 5 engineers)

  • Extend to products, employees, locations
  • Mapping tables multiply
  • Performance degrades
  • Hire more engineers

Year 3: “We need globally unique identifiers” ($5M total, still not done)

  • Invent your own URI scheme
  • Build a resolution service
  • Handle international characters
  • Use or end up reinventing IRIs

The BBC chose differently. They adopted RDF from the start. During the 2010 World Cup, their semantic web platform automatically generated over 700 pages — far more than manual curation would have allowed. By the 2012 Olympics, they expected 10 million page views per day across 10,000 Olympic pages. The result? Dramatically reduced costs while delivering richer content experiences.

I’ve seen this pattern play out several times firsthand, gone through it myself, and heard the same story from veterans with decades of experience. The ending is always the same: organizations converge on globally unique, hierarchical, dereferenceable identifiers. Also known as… IRIs.

Back to Our LLM Problem

Consider this SQL query an LLM might need to construct:

-- LLM has to guess: are these the same customer?SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id

JOIN crm_records r ON r.cust_num = c.customer_number

The LLM has to infer that customer_id, id, cust_num, and customer_number might refer to the same entity. It's making educated guesses based on naming patterns. Sometimes it's right. Usually-84% of the time, according to the research-it's not.

Now look at the same information in RDF:

# In RDF, identity is explicittc:employee-alice-johnson org:worksIn facilities:building-west-tower ; org:reportsTo tc:employee-bob-smith ; foaf:account it:users-ajohnson .

# No guessing needed!

The relationships are explicit. The identities are unambiguous. The LLM doesn’t need to infer-it can simply follow the links.

From Theory to Practice

Starting with IRIs doesn’t require a massive transformation. You can begin simply:

tc:employee-alice-johnson a :Employee ; :email "alice.johnson@techcorp.com" ; :employeeId "E12345" ;

:department tc:dept-engineering .

As your system grows, you can connect to other identifiers:

# Link internal and external identifierstc:employee-alice-johnson owl:sameAs hr:employee-alice-johnson ; owl:sameAs dir:staff-ajohnson ;

rdfs:seeAlso <https://linkedin.com/in/alice-johnson> .

Suddenly, your customer data can connect to your CRM, to social media, to any system that uses IRIs. No integration project required-just shared identity.

Why This Matters for Your LLM Initiative

This accuracy jump isn’t just about having more data. It’s about having unambiguous data. Here’s what proper identity gives LLMs:

Disambiguation: When the LLM sees “Johnson” in a query, it can determine whether you mean alice-johnson, bob-johnson, or other employees with that surname-no guessing required.

Context Traversal: The LLM can follow relationships confidently. “What projects does Alice’s manager oversee?” becomes a simple graph traversal instead of a complex inference problem. Each step of inference is an opportunity for hallucination-even a small error rate compounds dramatically when multiplied across multiple hops. By making these relationships explicit in the graph, we turn risky inference into deterministic traversal.

Source Attribution: Every fact can specify its origin. The LLM can qualify its answers: “According to the HR system, Alice reports to Bob, but the project management system shows her working directly with the CTO on the AI initiative.”

The Payoff: Intelligence Emerges

When you solve identity properly, something magical happens:

LLMs can traverse relationships confidently. No more ambiguity about which “customer” or “product” you mean. The IRI is the answer.

Federated queries become natural. IRIs work across system boundaries by design. Your data can live anywhere and still connect.

Knowledge accumulates automatically. New facts enhance rather than confuse. Every system can contribute to the growing understanding.

Provenance is built-in. Every fact can specify who said it, when, and with what confidence. Critical for AI explainability.

This is why knowledge graphs triple LLM accuracy. It’s not about the graph structure alone-it’s about solving identity in a way that eliminates ambiguity.

The Inevitable Convergence

Here’s the uncomfortable truth: complex data systems eventually build these same features:

What You’ll Call It:

  • “Entity Resolution Pipeline”
  • “Master Data Management”
  • “Canonical ID Service”
  • “Universal Resource Registry”

What You’re Actually Building:

  • Globally unique identifiers (IRIs)
  • Namespace management (IRI prefixes)
  • Entity equivalence (owl:sameAs)
  • Distributed resolution (HTTP dereferencing)

The only difference? You’ll spend 2–3 years and millions of dollars building a worse version of what RDF gives you for free.

This isn’t speculation. Look at any mature data platform:

  • Uber spent years building “algebraic property graphs” to avoid RDF, then presented it as a cautionary tale
  • Neo4j went from “RDF is too complex” to maintaining comprehensive RDF toolkits
  • Google’s Knowledge Graph uses RDF under the hood
  • Major platforms converge on the same patterns

Organizations need an identity system. The question becomes whether to build one that works at web scale from day one, or one that will need to be rebuilt when your data outgrows its original scope.

The Choice: Build on RDF or Rebuild RDF?

The proven approach? Start with RDF. Use the battle-tested solution that powers DBpedia, Wikidata, and enterprise knowledge graphs worldwide.

As Juan Sequeda wisely advises in his foreword to the Neo4j whitepaper Knowledge Graphs — Data in Context:

“One of my mantras is don’t boil the ocean. This means that your knowledge graph journey should start simple, be practical, and focus on the business return…”
Source (Neo4j Whitepaper PDF)

But do start with the right foundation. Because those identifiers determine everything else.

Tim Berners-Lee’s first rule of Linked Data couldn’t be simpler:

“Use URIs as names for things.”
Source (W3C Linked Data Principles)

Twenty-five years later, enterprises are still learning this lesson the hard way.

Dean Allemang, reflecting on their research showing 3x improvement in LLM accuracy, summed it up perfectly:

“The bottom line is it works three times better, and that’s pretty cool.”
Source (Knowledge Graph Insights Podcast)

Three times better. That’s the difference between an LLM that frustrates users and one that delivers value. All because you solved identity properly.

The question isn’t whether you’ll build these features. Most enterprises do.

The question is whether you’ll choose to start with the solution that already exists.

Key Takeaways

  1. LLMs triple their accuracy with knowledge graphs: From SQL to knowledge graphs (Sequeda et al., 2023)
  2. Identity is the foundation problem: Every knowledge graph must solve “are these the same thing?”
  3. RDF/IRIs solved this 25 years ago: Global uniqueness, dereferenceability, no central authority
  4. You’ll build these features anyway: Mature data platforms converge on IRI-like solutions
  5. Understanding foundations enables implementation: This series equips you to understand RDF before diving into LLM integration

Next: RDF Triples: Smallest Atom of Meaning, Largest Scope of Use — How do you represent knowledge once you can identify anything? Enter the RDF triple-the atom of meaning that scales to the universe.


Read the original article

Comments

  • By IanCal 2025-09-057:214 reply

    This seems to miss the other side of why all this failed before.

    Rdf has the same problems as the sql schemas with information scattered. What fields mean requires documentation.

    There - they have a name on a person. What name? Given? Legal? Chosen? Preferred for this use case?

    You only have one id for apple eh? Companies are complex to model, do you mean apple just as someone would talk about it? The legal structure of entities that underpins all major companies, what part of it is referred to?

    I spent a long time building identifiers for universities and companies (which was taken for ROR later) and it was a nightmare to say what a university even was. What’s the name of Cambridge? It’s not “Cambridge University” or “The university of Cambridge” legally. But it also is the actual name as people use it. The university of Paris went from something like 13 institutes to maybe one to then a bunch more. Are companies locations at their headquarters? Which headquarters?

    Someone will suggest modelling to solve this but here lies the biggest problem:

    The correct modelling depends on the questions you want to answer.

    Our modelling had good tradeoffs for mapping academic citation tracking. It had bad modelling for legal ownership. There isn’t one modelling that solves both well.

    And this is all for the simplest of questions about an organisation - what is it called and is it one or two things?

    • By jtwaleson 2025-09-057:432 reply

      Indeed, I often get the impression that (young) academics want to model the entire world in RDF. This can't work because the world is very ambiguous.

      Using it to solve specific problems is good. A company I work with tries to do context engineering / adding guard rails to LLMs by modeling the knowledge in organizations, and that seems very promising.

      The big question I still have is whether RDF offers any significant benefits for these way more limited scopes. Is it really that much faster, simpler or better to do queries on knowledge graphs rather than something like SQL?

      • By IanCal 2025-09-0513:141 reply

        I think it's a journey a lot of us have gone on, it's an appealing idea until you hit a variety of really annoying cases and where you are depends on how you end up trying to solve it. I'm maybe being unfair to the academic side but this is how I've seen it (exaggerated to show what I mean hopefully).

        The more academic side will add more complexity to the modelling, trying to model it all.

        The more business side will add more shortcuts to simplify the modelling, trying to get just something done.

        Neither is wrong as such but I prefer the tendency to focus on solving an actual problem because it forces you to make real decisions about how you do things.

        I think being able to build up knowledge in a searchable way is really useful and having LLMs means we finally have technology that understands ambiguity pretty well. There's likely an excellent place for this now that we can model some parts precisely and then add more fuzzy knowledge as well.

        > The big question I still have is whether RDF offers any significant benefits for these way more limited scopes. Is it really that much faster, simpler or better to do queries on knowledge graphs rather than something like SQL?

        I'm very interested in this too, I think we've not figured it out yet. My guess is probably no in that it may be easier to add the missing parts to non-rdf things. I have a rough feeling that actually having something like a well linked wiki backed by data sources for tables/etc would be great for an llm to use (ignoring cost, which for predictions across a year or more seems pretty reasonable).

        They can follow links around topics across arbitrary sites well, you only need more programmatic access for aggregations typically. Or rare links.

        • By ddkto 2025-09-0611:12

          The academic / business divide is a great example of the correct model depending on what you want to do. The academic side wants to understand, the business side wants to take action.

          For example, the Viable System Model[1] can capture a huge amount of nuance about how a team functions, but when you need to reorganize a disfunctional team, a simple org chart and concise role descriptions are much more effective.

          [1] https://en.wikipedia.org/wiki/Viable_system_model

      • By pbronez 2025-09-0518:111 reply

        Which company? I need to build an enterprise knowledge graph.

        • By jtwaleson 2025-09-069:44

          A small startup in the Netherlands, but they're very much searching for approaches themselves, I don't think they can help you right now.

    • By simonw 2025-09-057:352 reply

      That university example is fantastic.

      I went looking and as far as I can tell "The Chancellor, Masters, and Scholars of the University of Cambridge" is the official name! https://www.cam.ac.uk/about-the-university/how-the-universit...

      • By IanCal 2025-09-058:561 reply

        That's the one! It's not even that weird of a case compared to others but is an excellent example.

        Here's the history of the Paris example: https://en.wikipedia.org/wiki/University_of_Paris where there was one, then many, then fewer universities. Answering a question of "what university is referred to by X" depends on why you want to know, there are multiple possible answers. Again it's not the weirdest one, but a good clear example of some issues.

        There's a company called Merk, and a company called Merk. Merk is called Merk in the US but MSD outside of it. The other Merk is called Merk outside the US and EMD inside it. Technically one is Merk & Co and used to be part of Merk but later wasn't and due to trademark disputes, which aren't even all resolved yet.

        This is an area I think LLMs actually have a space to step in, we have tried perfectly modelling everything so we can let computers which have no ability to manage ambiguity answer some questions. We have tried barely modelling anything and letting humans figure out the rest, as they're typically pretty poor at crafting the code, and that has issues. We ended up settling largely on spending a bunch of human time modelling some things, then other humans building tooling around them to answer specific questions by writing the code, and a third set who get to actually ask the questions.

        LLMs can manage ambiguity, and they can also do more technical code based things. We haven't really historically had things that could manage ambiguity like this for arbitrary tasks without lots of expensive human time.

        I am now wondering if anyone has done a graph db where the edges are embedding vectors rather than strict terms.

        • By isoos 2025-09-0513:041 reply

          > I am now wondering if anyone has done a graph db where the edges are embedding vectors rather than strict terms.

          Curious: how would you imagine it working if there were such a graph db?

          • By IanCal 2025-09-0513:231 reply

            I had the idea a few hours ago so I'm sure there are holes in this but my first idea is forming a graph where the relationship isn't a fixed label but a description that is then embedded as a vector.

            First of all, consider that in a way each edge label is a one-hot binary vector. And we search using only binary methods. A consequence is anything outside of that very narrow path all data is missed in a search. A simple step could be to change that to anything within an X similarity to some target vector. Could you then search "(fixed term) is a love interest of b?" and have b? filled from facts like "(fixed term) is intimate with Y" and "(fixed term) has a date with Z"?

            There are probably issues, I'm sure there are, but some blend of querying but with some fuzziness feels potentially useful.

            • By fishmicrowaver 2025-09-0523:551 reply

              Isn't this exactly what neo4j does for graphrag?

              • By IanCal 2025-09-069:341 reply

                Is that vectors for edges or for searching the nodes? I’m talking about encoding the edges as vectors for traversal.

      • By muglug 2025-09-0618:09

        Brb updating my LinkedIn

    • By dwaite 2025-09-0522:00

      > The correct modelling depends on the questions you want to answer.

      Coincidentally, my main point in any conversation about UML I've ever had

    • By AtlasBarfed 2025-09-0523:03

      Basically it's name spacing hell right?

      To adapt the saying, an engineer is talking to another engineer about is system, saying he's having issues with names. So he's thinking of using name spaces.

      Now he has two problems

  • By jandrewrogers 2025-09-056:291 reply

    As the article itself points out, this has been around for 25 years. It isn’t an accident that nobody does things this way, it wasn’t an oversight.

    I worked on semantic web tech back in the day, the approach has major weaknesses and limitations that are being glossed over here. The same article touting RDF as the missing ingredient has been written for every tech trend since it was invented. We don’t need to re-litigate it for AI.

    • By rglullis 2025-09-056:462 reply

      I would be very interested in reading what you think it can't work. I am inclined to agree with the post on a sibling thread that mentions that the main problem with RDF is that it is been captured by academia.

      • By FrankyHollywood 2025-09-057:211 reply

        The article states "When that same data is transformed into a knowledge graph"

        This is a non-trivial exercise. How does one transform knowledge into a knowledge graph using RDF?

        RFD is extremely flexible and can represent any data and that's exactly it's great weakness. It's such a free format there is no consensus on how to represent knowledge. Many academic panels exist to set standards, but many of these efforts end up in github as unmaintained repositories.

        The most important thing about RDF is that everyone needs to agree on the same modeling standards and use the same ontologies. This is very hard to achieve, and room for a lot of discussion, which makes it 'academic' :)

        • By bfuller 2025-09-0511:04

          >This is a non-trivial exercise. How does one transform knowledge into a knowledge graph using RDF?

          by using the mcp memory knowledge graph tool, which just worked out of the box for my application of turning forum posts into code implementations.

      • By 4ndrewl 2025-09-057:16

        IME it's less than a "capture", more that most outside of academia don't have the requisite learning to be able to think in the abstract outside of trivial examples.

  • By rglullis 2025-09-056:222 reply

    Wrote this about one month ago here at https://news.ycombinator.com/item?id=44839132

    I'm completely out of time or energy for any side project at the moment, but if someone wants to steal my idea: please take an llm model and fine tune so that it can take any question and turn it into a SparQL query for Wikidata. Also, make a web crawler that reads the page and turns into a set of RDF triples or QuickStatements for any new facts that are presented. This would effectively be the "ultimate information organizer" and could potentially turn Wikidata into most people's entry page of the internet.

      • By yorwba 2025-09-057:20

        I asked "Which country has the most subway stations?" and got the query

          SELECT ?country (COUNT(*) AS ?stationCount) WHERE {
            ?station wdt:P31 wd:Q928830.
            ?station wdt:P17 ?country.
          }
          GROUP BY ?country
          ORDER BY DESC(?stationCount)
          LIMIT 1
        
        https://query.wikidata.org/#SELECT%20%3Fcountry%20%28COUNT%2...

        which is not unreasonable as a quick first attempt, but doesn't account for the fact that many things on Wikidata aren't tagged directly with a country (P17) and instead you first need to walk up a chain of "located in the administrative territorial entity" (P131) to find it, i.e. I would write

          SELECT ?country (COUNT(DISTINCT ?station) AS ?stationCount) WHERE {
            ?station wdt:P31 wd:Q928830.
            ?station wdt:P131*/wdt:P17 ?country.
          }
          GROUP BY ?country
          ORDER BY DESC(?stationCount)
          LIMIT 1
        
        https://query.wikidata.org/#SELECT%20%3Fcountry%20%28COUNT%2...

        In this case it doesn't change the answer (it only finds 3 more subway stations in China), but sometimes it does.

    • By IanCal 2025-09-057:071 reply

      Even without tuning Claude is pretty solid at this, just give it the sparql endpoint as a tool call. Claude can generate this integration too.

      • By rglullis 2025-09-057:151 reply

        But the idea of tuning the model for this task is to make a model that is more efficient, cheaper to operate and not requiring $BILLIONS of infrastructure going to the hands of NVDA and AMZN.

        • By ako 2025-09-057:49

          I've built an mcp for sparql and rdf. Used claude on iphone to turn pictures of archeological site information shields to transcription, to an ontology, to an rdf, to an er-model and sql statements, and then with mcp tool and claude desktop to save the data into parquet files on blobstorage and the ontology graph into a graph database. Then used it to query data from parquet (using duckdb), where sonnet 4 used the rdf graph to write better sql statements. Works quite well. Now in the process of using sonnet 4 to find the optimal system prompt for qwen coder to also handle rdf and sparql: i've given sonnet 4 access to qwen coder through an mcp tool, so it can trial and error different system prompt strategies. Results are promising, but can't compete with the quality of sonnet 4.

          Graph database vendors are now trying to convince you that AI will be better with a graph database, but what i've seen so far indicates that the LLM just needs the RDF, not an actual database with data stored in triplets. Maybe because these were small tests, if you need to store a large amount of id mappings it may be different.

HackerNews