SELECT is used in a few ways within Materialize, letting you:
- Express a view you want to materialize. e.g.
CREATE VIEW some_view AS SELECT...
- Read from materialized views, e.g.
SELECT * FROM some_view;
- Read from your sources.
To better understand the distinction between these uses, you should check out our architecture overview.
To perform reads, Materialize simply returns the result set of a dataflow. (Naturally, if the dataflow doesn’t exist, it must be created.)
This is covered in much greater detail in our architecture
overview, but here’s a quick summary of how
SELECT in different circumstances.
|Creating view||The created dataflow is persisted using the name given to it by
|Reading from view||Returns the view’s dataflow’s current result set.|
|Reading from source||Generates a dataflow, which is torn down after returning results to client.|
|ALL||Return all rows from query (implied default).|
|DISTINCT||Return only distinct values from query.|
|DISTINCT ON ( col_ref… )||Return only the first row with a distinct value for col_ref.|
|target_elem||Return identified columns or functions.|
|FROM table_ref||The table you want to read from; note that this can also be other
|join_expr||A join expression; for more details, see our
|WHERE expression||Filter tuples by expression.|
|GROUP BY col_ref||Group aggregations by col_ref.|
|HAVING expression||Filter aggregations by expression.|
|ORDER BY col_ref …||Order results in either ASC or DESC order (ASC is implied default).
ORDER BY does not work when creating views, but does work when reading from views.
|LIMIT||Limit the number of returned results to expr.|
|OFFSET||Skip the first expr number of rows.|
Because Materialize works very differently from a traditional RDBMS, it’s important to understand the implications that certain features of
SELECT will impact your Materialize instances.
Creating views in Materialize generates a persistent dataflow, which has a different performance profile from performing a
SELECT in an RDBMS.
When creating views, using the following features of
SELECT have certain side effects:
|DISTINCT||Materialize must keep a copy of every unique view it has ever seen for the column, which grows memory usage linearly with the number of distinct values in the column.|
|ORDER BY||Materialize does not support creating views with an ORDER BY clause. Instead, you can order the results when reading from a view.|
Reading from views
SELECT on an existing view is Materialize’s ideal operation. When it receives the
SELECT targeting a view, it returns the view’s underlying dataflow’s result set from memory.
Reading from sources
While this is covered more thoroughly in our architecture overview, it’s important to understand what Materialize does to ensure its behavior matches your expectations.
Whenever Materialize reads directly from a source, it must create a dataflow, and that dataflow is torn down as soon as Materialize reads the results and returns to the client.
This means if you repeatedly send the same
SELECT statement to Materialize which required reading from a source, it must calculate the results for the query every time it’s received, i.e. it cannot incrementally maintain the results of the query in a view. To make these kinds of statements more efficient, you should instead create a view.
Creating a view
This assumes you’ve already created a source.
The following query creates a materialized view representing the total of all purchases made by users per region.
CREATE VIEW mat_view AS SELECT region.id, sum(purchase.total) FROM mysql_simple_purchase AS purchase JOIN mysql_simple_user AS user ON purchase.user_id = user.id JOIN mysql_simple_region AS region ON user.region_id = region.id GROUP BY region.id;
In this case, Materialized will create a dataflow to maintain the results of this query, and that dataflow will live on until the view it’s maintaining is dropped.
Reading from a view
Assuming you create the view listed above, named
SELECT * FROM mat_view
In this case, Materialized simply returns the results of the dataflow you created to maintain the view.
Reading from sources
SELECT region.id, sum(purchase.total) FROM mysql_simple_purchase AS purchase JOIN mysql_simple_user AS user ON purchase.user_id = user.id JOIN mysql_simple_region AS region ON user.region_id = region.id GROUP BY region.id;
In this case, Materialized will spin up the same dataflow as it did for creating a view, but it will tear down the dataflow once it’s returned its results to the client. If you regularly wanted to view the results of this query, you would want to create a view for it.