Pipelined Relational Query Language, Pronounced "Prequel"

2026-02-2112:026658prql-lang.org

PRQL is a modern language for transforming data

— a simple, powerful, pipelined SQL replacement

 from invoices
filter invoice_date >= @1970-01-16
derive {
  transaction_fees = 0.8,
  income = total - transaction_fees
}
filter income > 1
group customer_id (
  aggregate {
    average total,
    sum_income = sum income,
    ct = count total,
  }
)
sort {-sum_income}
take 10
join c=customers (==customer_id)
derive name = f"{c.last_name}, {c.first_name}"
select {
  c.customer_id, name, sum_income
}
derive db_version = s"version()"

        
  • PRQL is concise, with abstractions such as variables & functions
  • PRQL is database agnostic, compiling to many dialects of SQL
  • PRQL isn’t limiting — it can contain embedded SQL where necessary
  • PRQL has bindings to most major languages (and more are in progress)
  • PRQL allows for column lineage and type inspection (in progress)
  • PRQL is ergonomic for data exploration — for example, commenting out a filter, or a column in a list, maintains a valid query
  • PRQL is simple, and easy to understand, with a small number of powerful concepts
  • PRQL allows for powerful autocomplete, type-checking, and helpful error messages (in progress)
  • PRQL’s vision is a foundation to build on; we’re open-source and will never have a commercial product
  • PRQL is growing into a single secular standard which tools can target
  • PRQL is easy for machines to read & write
  • The PRQL compiler is written in Rust
  • We talk about “orthogonal language features” a lot

PRQL consists of a curated set of orthogonal transformations, which are combined together to form a pipeline. That makes it easy to compose and extend queries. The language also benefits from modern features, such syntax for dates, ranges and f-strings as well as functions, type checking and better null handling.

Playground Book
from employees
select {id, first_name, age}
sort age
take 10
SELECT
  id,
  first_name,
  age
FROM
  employees
ORDER BY
  age
LIMIT
  10
from track_plays
filter plays > 10_000                # Readable numbers
filter (length | in 60..240)         # Ranges with `..`
filter recorded > @2008-01-01        # Simple date literals
filter released - recorded < 180days # Nice interval literals
sort {-length}                       # Concise order direction
SELECT
  *
FROM
  track_plays
WHERE
  plays > 10000
  AND length BETWEEN 60 AND 240
  AND recorded > DATE '2008-01-01'
  AND released - recorded < INTERVAL 180 DAY
ORDER BY
  length DESC
from employees
# `filter` before aggregations...
filter start_date > @2021-01-01
group country (
  aggregate {max_salary = max salary}
)
# ...and `filter` after aggregations!
filter max_salary > 100_000
SELECT
  country,
  MAX(salary) AS max_salary
FROM
  employees
WHERE
  start_date > DATE '2021-01-01'
GROUP BY
  country
HAVING
  MAX(salary) > 100000
from track_plays
derive {
  finished = started - unfinished,
  fin_share = finished / started,        # Use previous definitions
  fin_ratio = fin_share / (1-fin_share), # BTW, hanging commas are optional!
}
SELECT
  *,
  started - unfinished AS finished,
  (started - unfinished) / started AS fin_share,
  (started - unfinished) / started / (1 - (started - unfinished) / started)
   AS fin_ratio
FROM
  track_plays
from web
# Just like Python
select url = f"https://www.{domain}.{tld}/{page}"
SELECT
  CONCAT('https://www.', domain, '.', tld, '/', page) AS url
FROM
  web
from employees
group employee_id (
  sort month
  window rolling:12 (
    derive {trail_12_m_comp = sum paycheck}
  )
)
SELECT
  *,
  SUM(paycheck) OVER (
    PARTITION BY employee_id
    ORDER BY
      month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
  ) AS trail_12_m_comp
FROM
  employees
let celsius_to_fahrenheit = temp -> temp * 9/5 + 32

from weather
select temp_f = (celsius_to_fahrenheit temp_c)
SELECT
  temp_c * 9 / 5 + 32 AS temp_f
FROM
  weather
# Most recent employee in each role
# Quite difficult in SQL...
from employees
group role (
  sort join_date
  take 1
)
WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1
# There's no `version` in PRQL, but s-strings
# let us embed SQL as an escape hatch:
from x
derive db_version = s"version()"
SELECT
  *,
  version() AS db_version
FROM x
from employees
join b=benefits (==employee_id)
join side:left p=positions (p.id==employees.employee_id)
select {employees.employee_id, p.role, b.vision_coverage}
SELECT
  employees.employee_id,
  p.role,
  b.vision_coverage
FROM
  employees
  INNER JOIN benefits AS b ON employees.employee_id = b.employee_id
  LEFT OUTER JOIN positions AS p ON p.id = employees.employee_id
from users
filter last_login != null
filter deleted_at == null
derive channel = channel ?? "unknown"
SELECT
  *,
  COALESCE(channel, 'unknown') AS channel
FROM
  users
WHERE
  last_login IS NOT NULL
  AND deleted_at IS NULL
prql target:sql.mssql  # Will generate TOP rather than LIMIT

from employees
sort age
take 10
SELECT
  *
FROM
  employees
ORDER BY
  age OFFSET 0 ROWS
FETCH
  FIRST 10 ROWS ONLY

A PRQL query is a linear pipeline of transformations

Each line of the query is a transformation of the previous line’s result. This makes it easy to read, and simple to write.

PRQL serves both sophisticated engineers and analysts without coding experience.

By providing a small number of powerful & orthogonal primitives, queries are simple and composable — there’s only one way of expressing each operation. We can eschew the debt that SQL has built up.

PRQL is open-source, with an open community

PRQL will always be fully open-source and will never have a commercial product. By compiling to SQL, PRQL is compatible with most databases, existing tools, and programming languages that manage SQL. We’re a welcoming community for users, contributors, and other projects.

PRQL is designed to be extended, from functions to language bindings

PRQL has abstractions which make it a great platform to build on. Its explicit versioning allows changes without breaking backward-compatibility. And in the cases where PRQL doesn’t yet have an implementation, it allows embedding SQL with s-strings.

PRQL’s focus is analytical queries

PRQL was originally designed to serve the growing need of writing analytical queries, emphasizing data transformations, development speed, and readability. We de-emphasize other SQL features such as inserting data or transactions.

ClickHouse natively supports PRQL with

SET dialect = 'prql'

pyprql contains a Jupyter extension, which executes a PRQL cell against a database. It can also set up an in-memory DuckDB instance, populated with a pandas DataFrame.
A DuckDB extension to execute PRQL
QStudio is a SQL GUI that lets you browse tables, run SQL scripts, and chart and export the results. QStudio runs on Windows, macOS and Linux, and works with every popular database including mysql, postgresql, mssql, kdb…
Add PRQL models to your Prefect workflows with a single function.
Extension with syntax highlighting and live SQL compilation.
Write PRQL functions in PostgreSQL
Databend natively supports PRQL
Online in-browser playground that compiles PRQL to SQL as you type.

Provides Jupyter/IPython cell magic and Pandas accessor.

pip install pyprql

A CLI for PRQL compiler, written in Rust.

cargo install prqlc

brew install prqlc

winget install prqlc

Python bindings for prqlc.
JavaScript bindings for prqlc.
Compiler implementation, written in Rust. Compile, format & annotate PRQL queries.
Java, C, C++, Elixir, .NET, and PHP have unsupported or nascent bindings.

It starts with FROM, it fixes trailing commas, and it's called PRQL?? If this is a dream, don't wake me up. — Jeremiah Lowin, Founder & CEO, Prefect.

Oh wow I missed this. Clickhouse now supports PRQL: https://github.com/ClickHouse/ClickHouse/pull/50686
A few years ago, I started working on a language, called "dsql", short for declarative SQL, and a pun on "the sequel (to SQL)". I kinda chickened out of it then, the amount of study and research I needed was massive. prql here is better than I imagined: github.com/max-sixty/prql
SQL's hold on data retrieval is slipping! 8 new databases are emerging, and some speak entirely new languages for data querying. Know more infoworld.com/article/365490… #SQL #DataQuery #GraphQL #PRQL #WebAssembly
I want to give the PRQL a little boost here, "pipeline of transformations" is IMHO a good choice for readable query languages that need to deal with SQL-like aggregations, group by and count and sum all: github.com/max-sixty/prql


Read the original article

Comments

  • By jelder 2026-02-2316:311 reply

    DuckDB had the right idea: just allow some flexibility in the relative order of the `select` and `from` clauses, and make a few other concessions for ergonomics. This then becomes valid:

        from events      -- table is first, which enables autocomplete
        select
            count(),     -- * is implied, easier to type
            customer_id, -- trailing commas allowed everywhere
        group by all     -- automatically groups by all non-aggregate columns
        order by all     -- orders rows by all columns in selected order
    
    https://duckdb.org/docs/stable/sql/dialect/friendly_sql

    • By andrew_lettuce 2026-02-2316:43

      I get the ease of use - and sometimes use them myself- but implied (or relative) shortcuts are IMO a bad habit that can lead to serious issues that don't manifest as errors. I do like the from clause first which better matches the underlying relationship algebra!

  • By thesz 2026-02-2315:544 reply

    SQL is not a pipeline, it is a graph.

    Imagine three joins of three queries A,B and C, where first join J1 joins A and B, second join J2 joins A and C and third join J3 joins J1 and J2. Note that I said "queries," not "tables" - these A, B and C can be complex things one would not want or be able to compute more than once. Forget about compute, A, B and C can be quite complex to even write down and the user may really do not want to repeat itself. Look at TPC-DS, there are subqueries in the "with" sections that are quite complex.

    This is why pipeline replacements for SQL are more or less futile efforts. They simplify simple part and avoid touching complex one.

    I think that something like Verse [1] is more or less way to go. Not the Verse itself, but functional logic programming as an idea, where you can have first class data producers and effect system to specify transactions.

    [1] https://en.wikipedia.org/wiki/Unreal_Engine#Verse

    • By data_ders 2026-02-2316:061 reply

      TIL about Verse looks cool I'll have to check it out.

      > SQL is not a pipeline, it is a graph.

      Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?

      the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.

      For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).

      • By thesz 2026-02-2318:03

        Pipeline is a specific kind of a graph.

        Yes, there will always be hard-to-express queries, the question is how far can we go?

    • By snthpy 2026-02-2418:04

      Thanks, I'll check out Verse.

      I haven't seen anyone make the point about graphs before. FWIW PRQL allows defining named subqueries that can be reused, like J1 and J2 in your example.

    • By jnpnj 2026-02-2319:00

      Crazy to think that Fortnite might unleash a new population of people who toyed with functional-logic as their first paradigm.

    • By lloydatkinson 2026-02-2316:082 reply

      Does it really help to call SQL a graph?

      • By data_ders 2026-02-2316:11

        right? like it's a graph and a relational model query and a pipeline and a language and an abstract syntax tree and declarative logical plan

      • By thesz 2026-02-2318:041 reply

        It does. Just like any other programming language.

        • By lloydatkinson 2026-02-2319:111 reply

          May as well call everything a graph at that point; meaningless.

          • By thesz 2026-02-2322:37

              > meaningless.
            
            No.

            You present "programs are graphs" as trivial truth. True trivial truths are, as you pointed out, meaningless. But you leave out degree of applicability - information in the dependence graph differs between programming languages.

            Dependencies form a graph, and analyses needed to optimize execution of the program graph differ wildly between languages. Look at С++ aliasing rules and C's "restrict" keyword.

            One can't escape the dependence graph. But one can execute dependence graph better or worse, depending (pun intended) on the programming language.

  • By dewey 2026-02-2315:283 reply

    Every time I see these layers on top of SQL I think: Just use regular, boring SQL

    It will be around for a long time, there's an infinite number of resources and examples for it and if you ever have to onboard someone into your code they don't need to learn something new. You can get pretty far by just using CTEs to "pipeline".

    • By data_ders 2026-02-2315:53

      I'm as big a SQL stan as the next person and I'm also very skeptical anytime anyone says that SQL needs to be replaced.

      At the same time, it's challenging that SQL cannot be iteratively improved and experimented upon.

      IMHO, PRQL is a reasonable approach to extending SQL without replacing SQL.

      But what I'd love to see is projects like Google's zeta-sql [1] and Substrait [2] get more traction. It would provide a more stable, standardized foundation upon which SQL could be improved, which would make the case for "SQL forever" even more strong.

      I've blogged about this before [3].

      [1]: https://github.com/google/googlesql [2]: https://substrait.io/ [3]: https://roundup.getdbt.com/p/problem-exists-between-database...

    • By itishappy 2026-02-2317:011 reply

      Does anybody just use "regular, boring SQL" in practice though? All queries I have seen are loaded with regex and other non-standard extensions.

      Is there even a db vendor that offers full ANSII SQL support? Last I'd checked the answer was no.

      • By dewey 2026-02-2317:101 reply

        In my case I consider Postgres / MariaDB "regular, boring SQL".

        • By itishappy 2026-02-2322:361 reply

          The problem persists, as Postgres and MariaDB use incompatible SQL dialects, right down to (imo) core concepts such as how to specify an automatically generated primary key.

          • By dewey 2026-02-248:571 reply

            I'm aware of that, but what I meant is that they both extended the SQL standard in a similar way and they will have equivalent higher level features like "regex and other non-standard extensions" even if they are not 100% drop-in replacements.

            • By itishappy 2026-02-2415:58

              Understood. My point is that the underlying concepts are consistent, but the syntax differs between vendors. This largely applies to PRQL as well. Admittedly slightly moreso, but if I already have to learn a new SQL dialect to use postgres, I might not mind learning a nicer one.

    • By kubb 2026-02-2316:181 reply

      Complex queries in SQL can quickly get out of control.

      The fact that you need to replicate the same complex expressions in multiple values that you select or multiple parts of a where clause is bad enough.

      That there’s no way to pipe the result of a query into another query is just adding insult to injury. (Just create a custom view bro).

      But if technology competed in quality and not in vendor lock in, we wouldn’t have to deal with C++ or JavaScript.

HackerNews