Some common transformations are surprisingly difficult to express in SQL. This page documents our recommended idioms for several of these hard-to-express transformations.
These idioms are known to the Materialize optimizer, so using them result in much better performance than using idioms from other RDBMSes.
Top-K by group
Suppose you want to group rows in a table by some key, then filter out all but
the first k elements within each group according to some ordering. In other
databases you might use window functions. In Materialize, we recommend using a
LATERAL subquery. The general form of the
query looks like this:
SELECT * FROM (SELECT DISTINCT key_col FROM tbl) grp, LATERAL ( SELECT col1, col2... FROM tbl WHERE key_col = grp.key_col ORDER BY order_col LIMIT k )
For example, suppose you have a relation containing the population of various U.S. cities.
CREATE TABLE cities ( name text NOT NULL, state text NOT NULL, pop int NOT NULL ); INSERT INTO cities VALUES ('Los_Angeles', 'CA', 3979576), ('Phoenix', 'AZ', 1680992), ('Houston', 'TX', 2320268), ('San_Diego', 'CA', 1423851), ('San_Francisco', 'CA', 881549), ('New_York', 'NY', 8336817), ('Dallas', 'TX', 1343573), ('San_Antonio', 'TX', 1547253), ('San_Jose', 'CA', 1021795), ('Chicago', 'IL', 2695598), ('Austin', 'TX', 978908);
To fetch the three most populous cities in each state:
SELECT state, name FROM (SELECT DISTINCT state FROM cities) grp, LATERAL ( SELECT name FROM cities WHERE state = grp.state ORDER BY pop DESC LIMIT 3 )
AZ Phoenix CA Los_Angeles CA San_Diego CA San_Jose IL Chicago NY New_York TX Houston TX San_Antonio TX Dallas
Despite the verbosity of the above query, Materialize produces a straightforward plan:
EXPLAIN SELECT state, name FROM ...
%0 = | Get materialize.public.cities (u1) | TopK group=(#1) order=(#2 desc) limit=3 offset=0 | Project (#1, #0)