PRQL: Pipelined Relational Query Language

2023-07-2518:13519209github.com

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement - GitHub - PRQL/prql: PRQL is a modern language for transforming data — a simple, powerful, pipelined...

Show article

Pipelined Relational Query Language, pronounced "Prequel".

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it's readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.

PRQL can be as simple as:

from tracks
filter artist == "Bob Marley"                 # Each line transforms the previous result
aggregate {                                   # `aggregate` reduces each column to a value
  plays    = sum plays,
  longest  = max length,
  shortest = min length,                      # Trailing commas are allowed
}

Here's a fuller example of the language;

from employees
filter start_date > @2021-01-01               # Clear date syntax
derive {                                      # `derive` adds columns / variables
  gross_salary = salary + (tax ?? 0),         # Terse coalesce
  gross_cost = gross_salary + benefits_cost,  # Variables can use other variables
}
filter gross_cost > 0
group {title, country} (                      # `group` runs a pipeline over each group
  aggregate {                                 # `aggregate` reduces each group to a value
    average gross_salary,
    sum_gross_cost = sum gross_cost,          # `=` sets a column name
  }
)
filter sum_gross_cost > 100_000               # `filter` replaces both of SQL's `WHERE` & `HAVING`
derive id = f"{title}_{country}"              # F-strings like Python
derive country_code = s"LEFT(country, 2)"     # S-strings allow using SQL as an escape hatch
sort {sum_gross_cost, -country}               # `-country` means descending order
take 1..20                                    # Range expressions (also valid here as `take 20`)

For more on the language, more examples & comparisons with SQL, visit prql-lang.org. To experiment with PRQL in the browser, check out PRQL Playground.

Current Status - April 2023

PRQL is being actively developed by a growing community. It's ready to use by the intrepid, either with our supported integrations, or within your own tools, using one of our supported language bindings.

PRQL still has some minor bugs and some missing features, and probably is only ready to be rolled out to non-technical teams for fairly simple queries.

Here's our current Roadmap and our Milestones.

Our immediate focus for the code is on:

Building out the next few big features, including types and modules.
Ensuring our supported features feel extremely robust; resolving any priority bugs.

We're also spending time thinking about:

Making it really easy to start using PRQL. We're doing that by building integrations with tools that folks already use; for example our VS Code extension & Jupyter integration. If there are tools you're familiar with that you think would be open to integrating with PRQL, please let us know in an issue.
Making it easier to contribute to the compiler. We have a wide group of contributors to the project, but contributions to the compiler itself are quite concentrated. We're keen to expand this; #1840 for feedback.

Get involved

To stay in touch with PRQL:

Follow us on Twitter
Join us on Discord
Star this repo
Contribute — join us in building PRQL, through writing code (send us your use-cases!), or inspiring others to use it.
See the development documentation for PRQL. It's easy to get started — the project can be built in a couple of commands, and we're a really friendly community!

Explore

Repo organization

This repo is composed of:

It also contains our testing / CI infrastructure and development tools. Check out our development docs for more details.

Contributors

Many thanks to those who've made our progress possible:

Read the original article

Comments

By andrewl-hn 2023-07-2519:002 reply

For me the examples on the website https://prql-lang.org/ are the biggest selling point for PRQL, in particular the SQL it generates. It looks clean, straightforward, something I would've written myself.

In general, I like this slightly more careful take on modern database development. 10-15 years people would start a brand new database like Mongo, or Riak, or Influx, or whatever, and would try to convince application developers to select it for new projects. But recently we started getting more conservative options like EdgeDB, TimescaleDB, or even PRQL which all expect us to run Postgres with some addons and / or query preprocessors. Tech like this is so much easier to adopt!

I'm really liking what Edge folks are doing with schemas and migrations, but I do find PRQL syntax much more intuitive. My application code is littered with data transformation pipelines already: all these map / filter chains in TYpeScript, iterators in Rust, enumerables in Ruby, streams in Java, LINQ in .net, Rx in dozens of languages etc. etc. So the concept is very, very familiar. Getting nice SQL queries out of PRQL that I can store, inspect later, see the query plans, add indices where necessary, is just great. It's such a good tool!

By OJFord 2023-07-2520:567 reply

To me it seems quite nice, but really just trivially different from SQL - like if Ruby was 'friendlier syntax that transpiles to Python', meh? You'd use whichever you happened to learn first and not bother with the other. (That's often true even though it's more than that of course.)

The examples arbitrarily make SQL look more verbose:

    SELECT
      id,
      first_name,
      age
    FROM
      employees
    ORDER BY
      age
    LIMIT
      10

Yes! Of course I'd rather:

    from employees
    select {id, first_name, age}
    sort age
    take 10

..but wait, actually the SQL could've been:

    select id, first_name, age
    from employees
    order by age
    limit 10

and it's more verbose by a character or two... (no braces, but 'order by' vs 'sort')

By viraptor 2023-07-265:081 reply

It's not the tiny changes in the syntax/order that matter here. It's that if you want to limit the columns or do something else with them afterwards, in PRQL you append that text to the end. In SQL you'd have to wrap it around your original query, it inject joins in the middle. It doesn't read well anymore, because "how" you're doing things eclipses "what" you're doing.

For a simple select, this just doesn't matter. But you also wouldn't bother with PRQL for simple selects.

By meitham 2023-07-265:32

In python I use sqlalchemy core to build the SQL for me, but I manage the transactions myself. Sqlalchemy query builder offers a similar role to what you described, as in I can add another select or where field after I have already added a sort criteria. It’s clever enough to order these correctly based on the SQL dialect.

By andrewl-hn 2023-07-2521:213 reply

I use CTEs, window functions, and groupings all the time when I write reporting queries. These things tend to be much more verbose in raw SQL, and ORMs / Query Builders either do not support some of these features at all or do very poor job (like, force me to put raw SQL substrings in my code), or force to write DSLs that are even more verbose than raw SQL. Look at corresponding PRQL samples, and you may see an appeal.

Also, I agree, no one should write SQL like this - screaming keywords, superficial newlines, etc. I don't think this style made sense ever, even in 1970s.

By Pxtl 2023-07-2521:531 reply

While there's some stuff in C#/LINQ/EF that's more verbose (left joins are often a nightmare) or not-supported, I'll always say that I prefer writing queries in EF than in SQL, at least when dealing with SQL features that are supported by EF (which is a lot of them, it's a very expressive dialect).

But EF lets you start with FROM, lets you do whichever keywords you need in whichever order (instead of WHERE -> GROUP BY -> HAVING and you've got to CTE or Subquery if you want another GROUP BY). It also lets you access the members of a group because the objects are still treated as a graph instead of being pulverized into a flat table like SQL does. It also makes your FKs into first-class navigational properties of the table.

Like, if I have an addressID and I want to get its country code?

In MS SQL that's

    SELECT CountryCode
    FROM Country
    INNER JOIN ProvState ON ProvState.CountryID = Country.ID
    INNER JOIN Address ON Address.ProvStateID = ProvState.ID
    WHERE Address.ID = @AddressIDParam

In EF that's

    db.Addresses
      .Where(a => a.ID == addressIDParam)
      .Select(a => a.ProvState.Country.CountryCode)

EF has a hell of a lot of flaws, but linguistically I love it. Yes there's a lot of aliasing boilerplate in EF, but the ability to walk the foreign keys and the fact that you can put the select after the table name pays off so very well.

Also there's a dialect of LINQ that looks more like SQL but it's kind of weird and I don't love it so I prefer to use the lambda syntax above.

In that dialect, it's

    from a in db.Addresses
      where a == addressIDParam
      select a.ProvState.Country.CountryCode

which is even more terse and SQL-y although I find it a weird linguistic mash-up.

By pragmatic 2023-07-2522:324 reply

The downside though is you have to grab the sql it’s generating somehow to try to optimize it, figure out what crazy cross apply madness it’s going or to figure out why it’s blowing sql servers recursion limit.

I prefer to avoid linq syntax now. It’s a false economy.

By Turskarama 2023-07-260:06

In my experience, it quite rarely uses cross apply, and typically only for functions where you're doing complicated join filtering and trying to extract full EF entities.

If you don't use linq you can't get EF entities anyway, and if you don't need EF entities you can can still use linq to get a projection and avoid the cross apply! The worst case scenario often ends up being writing a linq query to get your primary keys, then loading entities just using the list of keys.

I've used linq for years and it makes the basic and intermediate stuff way simpler, and the complex stuff no more complex. In hundreds of thousands of lines of code, we have exactly TWO queries where it ended up simpler to hand write SQL, and EF did not prevent us from doing that!

By WorldMaker 2023-07-2618:26

EF for some time (since the "Core" reboot) has made things like "crazy cross apply" "opt-in only" and instead you get runtime exceptions for LINQ query shapes it thinks can only be done that way. Some of those crazy things early days EF silently supported still aren't even possible in today's EF, for mostly better (but sometimes rarely worse), let alone opt-in behaviors.

I think more than ever current EF generates SQL that straightforward looks like you would expect given the LINQ you've written, with only surprises being runtime exceptions before it ever tries to confuse an SQL database with a horribly deoptimized query.

Today's EF also has more tools than ever to optimize its queries. It logs query text by default in some debug contexts to visible debug consoles, and to certain telemetry providers when errors occur in certain release/Production contexts. It's easy to opt-in to such logging in even more contexts should you desire that. (Thanks to modern .NET's ubiquitous System-owned ILogger infrastructure and the modern "Generic Hosts" approach.) For advanced scenarios, intercepting specific LINQ queries and enhancing/optimizing their output based on domain knowledge of the queries or the specific databases is easier than ever with simple dependency injection.

By aayjaychan 2023-07-2610:32

I drew a different conclusion from similar experience. I avoid navigation properties and other advanced mapping features, so an entity maps flatly to one table.

The LINQ queries will be more verbose as you'll need to write the join and group clauses explicitly, but I find it much easier to predict the performance of queries since the generated SQL will look almost exactly the same as the LINQ syntax. It's also less likely to accidentally pull in half the database with `.Include()` this way.

By Pxtl 2023-07-2523:15

Oh I agree the actual mapping framework where it converts the EF/LINQ is deeply flawed. But that's the underlying implementation.

I just mean on a linguistic level, I tremendously prefer EF/LINQ/C# to SQL.

By imtringued 2023-07-2612:17

I have some ideas how to implement a high performance ORM (yes that sounds strangey doesn't it?) that makes extensive use of CTEs, window functions and temporary tables. Writing reports in that ORM would be downright trivial.

From my perspective the JPA speciation is downright harmful. There can never be a good JPA implementation. I don't understand how there are no Java ORMs that actually learned anything from the mess that Hibernate etc are.

Take a look at how nasty even some of the most trivial concepts like calculated properties.are: https://blog.eyallupu.com/2009/07/hibernate-derived-properti...

In my hypothetical ORM design, calculated properties would be easy and fast by default.

Also, the CORBA style lazy loading based on proxies is ridiculous. As if people truly wanted a landmine like that to blow up on them.

By pjot 2023-07-2521:311 reply

At least we can all agree on leading commas though!

By slotrans 2023-07-2522:551 reply

death before trailing commas

By akdor1154 2023-07-260:121 reply

Well if it was after they wouldn't be trailing would they?

By euroderf 2023-07-269:32

for the win

By hn_throwaway_99 2023-07-2522:101 reply

I agree that I think the difficulty with the uptake will be "it's really just less annoying SQL", and it's hard to overtake a technology that's so ubiquitous when your fixes are really "nice to haves" vs something truly transformational.

That said, it's not the succinctness that's an improvement, it's that the pipelined nature of PRQL really maps much better to how people should think about queries, and also how the server executes them! Something as trivially simple as putting the from clause first means I can now get much better typeahead/autocomplete support from dev tools. Heck, I already do this now in a slightly more annoying manner: I write "select * from some_table ..." first, then I go and type out the actual columns I want because once the from clause is correct my IDE will autocomplete/show errors for possible columns.

By mikea1 2023-07-262:311 reply

> the pipelined nature of PRQL really maps much better to how people should think about queries

I disagree. Database engines take SQL and transform it into an execution plan that takes into consideration database metadata (size, storage, index analytics, etc.). Queries should be thought of with a _set based_ instead of _procedural_ approach to maximize the benefits of this abstraction - diving into the implementation details to guide the execution plan formation only when necessary.

Also, the pipeline approach could be achieved with common table expressions (CTEs), right?

That said, I think PRQL looks promising because it is a solid attempt to make RDBMS development more approachable. I also like that `from` comes before `select`: it is far more readable. A solid and modern IDE experience for PRQL could be a "killer app".

By wodenokoto 2023-07-264:584 reply

I disagree. I find it extremely hard to reason about large queries as set transformations, whereas it is much easier to break it down to "first this, then that". And this is long before I've even started writing my first line of SQL.

So let me write it procedurally and have the optimization engine fix it for me, just like how it fixes my SQL.

Even SQL queries are often better understood procedurally. Take this one [1]:

    SELECT article, dealer, price
    FROM   shop s1
    WHERE  price=(SELECT MAX(s2.price)
                  FROM shop s2
                  WHERE s1.article = s2.article)
    ORDER BY article;

That inner WHERE clause doesn't make sense in my opinion, unless you think of it procedurally as for each row in s1, ask do a search for the highest price amongst all items that share article number.

[1] https://dev.mysql.com/doc/refman/8.0/en/example-maximum-colu...

By hn_throwaway_99 2023-07-267:17

Completely agree, thanks for putting it better than I could have, with an excellent example. Correlated subqueries like the example you give, or similarly lateral joins in postgres, fundamentally are treated like for loops by DB engines anyway.

Semi-related, but the example you give is also why I love Postgres' "DISTINCT ON" functionality (I don't know if other DBs have something similar) - it makes it so much easier to reason about these "give me the 'first' one from each group" type queries without having to resort to correlated subqueries.

By slt2021 2023-07-2617:49

queries like these are best suited for window functions, although I am not sure Mysql supports it:

  SELECT article, dealer, price FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY article ORDER BY price DESC) as rnk
    FROM   shop s1
  ) sub 
  WHERE sub.rnk=1
  ORDER BY article;

this query will be a single pass over table without loops/joins

By tgulacsi 2023-07-265:27

This is the "set based" approach for the MAX: there does not exist a bigger element:

  SELECT article, dealer, price
    FROM   shop s1
    WHERE  NOT EXISTS (SELECT 1 FROM shop s2 
                         WHERE s2.price > s1.price AND
                               s2.article = s2.article)
    ORDER BY article;

By imtringued 2023-07-2612:03

Unpopular opinion.

The uncorrelated example should have been rewritten with a CTE and should have been aliased as 'article_max_price' as if it was a computed property and where price = amp.price

By ravi-delia 2023-07-260:27

That's a really simple example though. I think the real pitch for PRQL is that the syntax is way more regular- the group operator just runs a normal pipeline on each group and multiple derivations with different operations between them don't need to all be plotted out in advance, just as an example. SQL suffers because it's not really composable, especially once you get outside of specific modern versions.

By jtsuken 2023-07-2522:17

Being able to pick the source (i.e. the FROM clause) first is useful in itself, as you then get the benefit of sensible auto-complete suggestions.

By jatins 2023-07-267:15

Even though a small example doesn't highlight the advantages well, I still prefer the PRQL syntax in this because that reads like map/filter code I'd write in some programming language.

So the mental model seems easier to me

By krick 2023-07-2613:06

"Order by" is a much better name, by the way. I'm permanently slightly annoyed by the fact that in programming it's a custom to call ordering "sorting" for some unimaginable reason.

By t8sr 2023-07-2519:1910 reply

I don't get that - to me the examples are much less readable than SQL and I don't understand why I should want to use this. Like, yes, you can reorder the query sections, which seems to be everyone's complaint about SQL, but then you also have multiple types of brackets, colons and other syntax for no reason, all while not really accomplishing anything SQL doesn't already do.

What's the attraction?

By snthpy 2023-07-2519:412 reply

If you're happy with SQL then there isn't much point.

For the folks building and supporting PRQL, SQL just has a few too many warts and the popularity of tools like Pandas, dplyr, Polars, LINQ, ... shows that for analytical work we often like to work with our data differently. Other frameworks and languages feel that we should throw out Relational Algebra as well but we feel that's like throwing the baby out with the bathwater.

PRQL's core tenets are that Relational Algebra is foundational and a great abstraction but what's needed is a modern, ergonomic language that allows us to work with data the way most of us conceive of it - as pipelines of transformations.

Personally I feel much more productive in PRQL. I can start with a dataset and append one transformation at a time, writing from top to bottom as I go along without having to jump back and forth between the SELECT and the WHERE and the GROUP BY clause etc... .

Also, if I want to take out a step, I can just comment out that line and the rest of the pipeline usually still works. This might seem like a minor thing but in my experience it's those kind of ergonomics that make the difference in actual day to day work rather than fancy theoretical features you only use once in a blue moon. It's therefore worth noting that this was an intentional design decision. You try and take out some steps from your SQL query and see how well the rest of your query holds up.

By aidos 2023-07-2520:051 reply

For what it’s worth, it looks really readable to me. I have decades of sql experience at this point so consider myself pretty proficient but I can see the appeal of having a terser syntax for transformations. I especially like the “it just makes sql” approach.

Stepping through the second example on that page I know how I could do the same in sql, and I also know that it would be harder for most people to follow.

Question about that example though - why is there an aggregate without an assignment to a variable (average gross_salary) like the other examples?

By snthpy 2023-07-2520:211 reply

I couldn't find that example right now but there is a similar line in the main example on the prql-lang.org homepage:

    aggregate {
        average total,
        ...
    }

I can't definitively say why it is there, other than perhaps just to show that you can specify aggregations without having to give them an alias. The column name won't be pretty but if you're just interactively trying something out and want to see the results then you probably won't care.

Does that help?

By aidos 2023-07-2520:411 reply

Gotcha. The thing that’s not immediately clear from the syntax is which columns I’m getting out as a final result of my query. I guess the last select you ran + any derived ones since then?

By snthpy 2023-07-2521:451 reply

That's right. It can be a bit hard to keep track of but the rules are quite simple. You start with all the columns from a `from`, then any `derive` just adds columns. A `select` restricts the columns to those selected and a `group + aggregate` replaces the columns with those from the `group` + those from the `aggregate`.

I once tabled a proposal whether each query should be forced to end with a `select` to make the final column list explicit. This could be generated by the compiler. It was felt that that wasn't necessary though and would also be somewhat arbitrary since you also need to know what columns are available at each previous step of the pipeline if you want to be able to make modifications. As the tooling improves, you could perhaps be shown the current list of columns as you hover over each line?

By aidos 2023-07-2522:00

There’s nothing stopping people from using their own convention of selecting early to grab the stuff they need and selecting again right at the end to be explicit about what they want. Like select *, it’s better to just be explicit.

By snthpy 2023-07-2519:59

Disclaimer: I'm a PRQL contributor.

By hn_throwaway_99 2023-07-2522:301 reply

The attraction is that, especially for more complicated queries with complex joins, subqueries, aggregates, etc., that the structure of PRQL much more closely matches the "English", mental-model of the query. I can just read PRQL linearly, from top to bottom, and at each point I know what is being "pipelined" into the next phase of the query.

With SQL I have to look all over the place. And it's not just that FROM should come before SELECT, it's that if I'm doing, say, an aggregation with a group by, normal SQL doesn't really have anything to make me think about the ungrouped rows, and then merge them together to get the aggregate values. With PRQL I can just go top to bottom, and for me it's much easier to reason about (i.e. first get all the rows, then group by some specific columns, then take aggregates, etc.)

And I say this as someone who spends about half my days in SQL at present.

By pragmatic 2023-07-2522:321 reply

The downside is, debugging what actual sql this monstrosity spits out.

By hn_throwaway_99 2023-07-2522:53

Two points:

1. The SQL that I've seen output by PRQL isn't monstrous, at least no more than it would be coding by hand.

2. I do agree that I don't think PRQL will have much uptake until it is baked in to DB systems, at least as a plugin. One reason (among many) that I hate ORM systems is that with native SQL it's very easy for me to just capture a SQL statement from my DB logs and then search my code for an offending statement. But given how PRQL is transpiled, I don't see any reason it couldn't be essentially part of the query planner, so that query logs etc. are reported in PRQL and not SQL. E.g. the query planner already takes text (SQL) -> Query Plan, it seems rather straightforward to go text (PRQL) -> text (SQL) -> Query Plan, with the SQL intermediate just being an implementation detail I don't have to care about as a DB user.

By 0cf8612b2e1e 2023-07-2519:30

The attraction is something that was designed after decades of usage and PL research. Consistency of syntax is a big one for me. A favorite example of mine:

  SELECT substring('PostgreSQL' from 8 for 3);
  SELECT substring('PostgreSQL', 8, 3); -- PostgreSQL-only syntax
  SELECT trim(both from 'yxSQLxx', 'xyz');
  SELECT extract(day from timestamp '2001-02-16 20:38:40');

Taken from: https://www.edgedb.com/blog/we-can-do-better-than-sql

Maybe if SQL would give me that monumental ask of trailing commas, perhaps I would hate it less.

By slt2021 2023-07-2519:291 reply

I think main difference is how PRQL translates into query execution plan, with SQL you need to read entirety of query to get rough understanding of order of operations (especially if you join two-three tables and have some nested and lateral queries).

with PRQL I see that the order will be explicit and set up by developer, so any footguns will be evident.

things like predicate push down, optimization fence, variable rewrite, etc are not needed to be relied upon, because pipeline of query is more explicit in PRQL

also since it is new lang, it can be naturally extended into ETL type data pipelines

also because PRQL can be translated into query execution plan - it can be converted into non-SQL languages like MongoDB or pandas / spark / etc, eliminating SQL altogether for distributed nosql engines

By aidos 2023-07-2520:291 reply

Is that true? Doesn’t this just effectively translate into sql to be executed on the db as usual if you’re running on an rdbms?

Looking at the examples and the sql generated in the playground you’re just going to have the query planner doing its thing to decide how to execute eventually anyway.

By slt2021 2023-07-2520:40

It transpiles into sql currently, but my understanding it is not long term vision.

Language arch allows in the future to replace sql for querying data

By dlisboa 2023-07-2520:18

Maybe you didn't see the more complex examples, but some of the SQL queries it abstracts are very unintuitive to write by hand. PRQL turns it into something more readable and closer to the expected mental model.

For people who can write error-free and complex portable SQL queries by heart this may not be really interesting. From experience, however, that is not a skill most developers have.

By the_mitsuhiko 2023-07-2520:424 reply

I don't use PRQL but I absolutely get the appeal but specifically on the readability part, some things that are easy in PRQL are just awful in SQL.

From the website for instance this is a nightmare to do in SQL:

    from employees group role (sort join_date take 1)

By snthpy 2023-07-2521:501 reply

Unfortunately the linebreaks were lost and, as shown that isn't a valid PRQL query. It would have to be either

    from employees
    group role (
        sort join_date 
        take 1
        )

    from employees | group role (sort join_date | take 1)

In English:

    Take the 1st employee 
    by (earliest) join_date 
    for each role 
    from the set of employees

By zX41ZdbW 2023-07-2521:54

ClickHouse:

SELECT * FROM employees ORDER BY join_date LIMIT 1 BY role

By housecarpenter 2023-07-2610:23

Isn't this a fairly simple way of doing this? That said, it is a bit non-obvious if you haven't seen it before.

    select earliest_joiner.* from employees as earliest
    left join employees as earlier on
      earlier.role = earliest.role
      and earlier.join_date < earliest.join_date
    where earlier.id is null
    order by earliest.join_date

By virtualwhys 2023-07-2523:27

What is the generated SQL of that expression?

This is indeed a sticky problem, one that usually requires a subselect or other workaround to address the non-determinism of group by + order by; i.e. one cannot simply "select * from employees group by role order by join_date limit 1" and be guaranteed to get the expected ordering.

By zX41ZdbW 2023-07-2521:241 reply

What does it mean? Is it the same as "LIMIT BY" in ClickHouse?

https://clickhouse.com/docs/en/sql-reference/statements/sele...

By the_mitsuhiko 2023-07-2521:35

It picks the first result grouped by role and sorted by join date. I believe this can be expressed with limit by in ClickHouse.

By ravi-delia 2023-07-260:32

I mean if you can look at the "expressions" example on the homepage and say the SQL is more readable than PRQL, then more power to you. Hell, actually more power to you, that's extremely impressive. But I think for many people, especially programmers more used to parsing brackets and nesting than pseudo-english, PRQL is easier in the places where you're trying to compose.

By andrewl-hn 2023-07-2521:11

Oh, I'm a big fan of raw SQL, too. There's very little out there that can bring me same joy as a well-written 30-50 lines-long sql query! Like, I can spend an hour or two on it, but when it runs and produces the results in the desired form, it's sooo rewarding! And realizing that producing similar results in my main language (be it Rust or TypeScript, or whatever) would take me 3-4 days instead (and it would run much longer) makes me appreciate SQL programming more and more. In terms of "usefulness per minute spend" nothing comes close to SQL!

However, the syntax can be a bit clunky from time to time. I'm very fortunate that I use Postgres only in the past 5-7 years, so CTEs, and things like Timescale or JSONB, or PlV8 / PlRust for stored procedures are often on the menu. Yet, simple things still require repeating fragments of code from time to time, and complex grouping or window functions code often looks too convoluted.

And as I wrote in my GP comment: I (and many other developers) already use pipelines for data manipulation in code - beyond databases. Say, we got data from an API endpoint, or we mixing data from several data sources. SQL is familiar because I see it often, PRQL is familiar because I use similar pipelining concepts often in my no-SQL code. Would I use PRQL for super simple queries? Probably not. Would I be upset if one of my coworkers used it for simple queries? No, why would I? Would I try implementing complex reporting query using PRQL? Yeah, I would. Partially because I suspect I would get to my data in desired form quicker with pipelines, and partially because I know: if I get stuck I would convert my intermediary query to SQL and pick up from there.

For me PRQL looks better then most DSLs for ORMs or Query Builders in most languages. Adding a new tool to a project may be annoying, but depending on a project I'd rather pay this price once if it makes my less SQL-savvy team mates more comfortable. "Yet another syntax" is obviously the main point against, but from the application developer perspective ORMs / Query Builders often force one to learn "yet another syntax" anyway. "Don't use ORM / Builder" is an often voiced opinion and yet in practice we work with them all the time, too.

So, I view PRQL not as an alternative to SQL but as a third database communication dialect in addition to SQL and my ORM / Query Builder DSLs.

By spullara 2023-07-2519:231 reply

it seems like any SQL parser could let you put FROM first and solve a lot of the annoyances of SQL

By mulmen 2023-07-2519:491 reply

The stiff syntax is a feature. The infamous 1000 line sales report query is already a bear to maintain. If you get a clever developer reordering syntax it will only be worse.

By strokirk 2023-07-2522:581 reply

It’s certainly not a feature. A lot of important SQL usage is ad-hoc queries, and they are more annoying to type than they should be.

By mulmen 2023-07-260:171 reply

I don’t understand how changing the order of the clauses makes a query easier to type.

By spullara 2023-07-261:132 reply

If you have auto-complete it can see what table you are using and complete the column names. Starting with SELECT it could be any column in the database.

By mulmen 2023-07-262:151 reply

Huh. I have been doing this for like 15 years and never have a problem. Text editors like Sublime Text suggest names based on the content of the file. SQL Developer, SQLWorkbench, and DataGrip all seem to handle it just fine.

By spullara 2023-07-266:061 reply

It will suggest names that aren't in the table you are going to query unless it is psychic.

By mulmen 2023-07-2616:19

It is psychic in the sense that it makes educated guesses based on previous entries and the current database and schema. This is the same kind of educated guess it would make with a table name.

By nhinck2 2023-07-261:451 reply

I see this complaint stated over and over again, how hard is it really to type SELECT * FROM x a and then go back?

By spullara 2023-07-262:501 reply

Is there any downside at all to putting FROM first?

By mulmen 2023-07-2616:20

Disrupting decades of inertia to placate a small humber of unskilled novices.

By zX41ZdbW 2023-07-2520:171 reply

We have recently merged PRQL support into ClickHouse: https://github.com/ClickHouse/ClickHouse/pull/50686

It's currently more like an experiment - I'm not sure if it will be usable or useful. There are some concerns about Rust, although minor: https://github.com/ClickHouse/ClickHouse/issues/52053#issuec...

By snthpy 2023-07-2522:171 reply

This is awesome! Thank you!

Would that mean it would also be able to use it in clickhouse-local?

By zX41ZdbW 2023-07-2522:21

Yes:

    clickhouse-local --dialect prql

or:

    $ clickhouse-local

    :) SET dialect = 'prql'

By 0cf8612b2e1e 2023-07-2518:515 reply

I really want this to take off and become a first party supported language for Postgres. Yes, yes, relational algebra is beautiful and all that jazz. SQL is a garbage, first pass design that should have been replaced decades ago. Even Codd has complaints about it. It is amazing what he invented, but we have learned a lot about PL design since then.

By aerzen 2023-07-2519:151 reply

People really like to associate relational algebra with SQL, probably because they learned them one alongside another. But SQL is really terrible relational language - it breaks a few core concepts in different places, like relations being unordered sets, that you can ORDER BY. This bubbles up as relations losing ordering after being wrapped into a subquery, which is really unexpected.

PRQL has a data model very similar to the relational one. The only big difference is that relations are ordered - they are defined as arrays of tuples. So let's hope that PRQL gets to be known as "the relational language Mk II"

By cmrdporcupine 2023-07-2617:59

Seems like you'd have to break quite a few relational model concepts in order to retain compatibility with SQL, no?

Ordered tuples like you say, but also duplicate tuples. And null values.

By snthpy 2023-07-2522:261 reply

Not quite what you're asking for but DuckDB has both PRQL [1] and Postgres [2] extensions, so you could probably query your Postgres database with PRQL from there.

There's also a DBeaver plugin [3] which we still need to document better and simplify the usage of but you could potentially also use that to query Postgres with PRQL.

Finally there is pyprql [4] with which you could query Postgres from a Jupyter notebook.

[1]: https://github.com/ywelsch/duckdb-prql

[2]: https://duckdb.org/docs/extensions/postgres_scanner.html

[3]: https://github.com/PRQL/prql/issues/1643

[4]: https://github.com/PRQL/pyprql

(Disclaimer: I'm a PRQL contributor.)

By stevage 2023-07-2523:49

By first party I assume they mean using psql and postgres's own tooling.

By vkazanov 2023-07-2519:12

Relational algebra is still great, it is the sql that made an unbelievable mess out of this beautiful idea.

By globular-toast 2023-07-269:501 reply

Same. I first learnt SQL 20 years ago as a teenager and even back then I remember thinking how odd and, quite literally, backwards the language was. Back then I probably thought I just didn't understand enough to see why it had to be so. Now I know there is no reason. SQL has become more like a natural language. There's no arguing against it, you have to speak it even if it doesn't make sense. But it's not a natural language and we can do better.

By ako 2023-07-275:571 reply

It’s optimized for reading and use: the first time you see a query written by another developer, the interesting bit is what data it returns (the select part). Once you know that, you may be interested in the how, where does that data come from? Same for functions or methods, you first see the signature, input and output structures, before you see the implementation.

By globular-toast 2023-07-279:29

Good point, I didn't think of it like that and, oddly, I do often advocate for writing things "backwards" from highest level of abstraction to lowest. Maybe there is a good reason SQL has stuck around.

By burcs 2023-07-2519:16

Yeah, I have never liked that you choose what you are querying before you select the source. I think the formatting here is so much more intuitive.