SELECT

SELECT is used in a few ways within Materialize, letting you:

To better understand the distinction between these uses, you should check out our architecture overview.

Conceptual framework 

To perform reads, Materialize simply returns the result set of a dataflow. (Naturally, if the dataflow doesn’t exist, it must be created.)

This is covered in much greater detail in our architecture overview, but here’s a quick summary of how Materialize handles SELECT in different circumstances.

Scenario SELECT dataflow behavior
Creating view The created dataflow is persisted using the name given to it by CREATE VIEW.
Reading from view Returns the view’s dataflow’s current result set.
Reading from source Generates a dataflow, which is torn down after returning results to client.

Syntax 

SELECT ALL DISTINCT ON ( col_ref , ) target_elem , FROM table_expr , WHERE expr GROUP BY col_ref , HAVING expr ORDER BY col_ref ASC DESC , LIMIT expr OFFSET expr
Field Use
ALL Return all rows from query (implied default).
DISTINCT Return only distinct values from query.
DISTINCT ON ( col_ref… ) Return only the first row with a distinct value for col_ref.
target_elem Return identified columns or functions.
FROM table_ref The table you want to read from; note that this can also be other SELECT statements.
join_expr A join expression; for more details, see our JOIN documentation.
WHERE expression Filter tuples by expression.
GROUP BY col_ref Group aggregations by col_ref.
HAVING expression Filter aggregations by expression.
ORDER BY col_ref Order results in either ASC or DESC order (ASC is implied default).

ORDER BY does not work when creating views, but does work when reading from views.
LIMIT Limit the number of returned results to expr.
OFFSET Skip the first expr number of rows.

Details 

Because Materialize works very differently from a traditional RDBMS, it’s important to understand the implications that certain features of SELECT will impact your Materialize instances.

Creating views 

Creating views in Materialize generates a persistent dataflow, which has a different performance profile from performing a SELECT in an RDBMS.

When creating views, using the following features of SELECT have certain side effects:

Field Impact
DISTINCT Materialize must keep a copy of every unique view it has ever seen for the column, which grows memory usage linearly with the number of distinct values in the column.
ORDER BY Materialize does not support creating views with an ORDER BY clause. Instead, you can order the results when reading from a view.

Reading from views 

Performing a SELECT on an existing view is Materialize’s ideal operation. When it receives the SELECT targeting a view, it returns the view’s underlying dataflow’s result set from memory.

Reading from sources 

While this is covered more thoroughly in our architecture overview, it’s important to understand what Materialize does to ensure its behavior matches your expectations.

Whenever Materialize reads directly from a source, it must create a dataflow, and that dataflow is torn down as soon as Materialize reads the results and returns to the client.

This means if you repeatedly send the same SELECT statement to Materialize which required reading from a source, it must calculate the results for the query every time it’s received, i.e. it cannot incrementally maintain the results of the query in a view. To make these kinds of statements more efficient, you should instead create a view.

Examples 

Creating a view 

This assumes you’ve already created a source.

The following query creates a materialized view representing the total of all purchases made by users per region.

CREATE VIEW mat_view AS
    SELECT region.id, sum(purchase.total)
    FROM mysql_simple_purchase AS purchase
    JOIN mysql_simple_user AS user
        ON purchase.user_id = user.id
    JOIN mysql_simple_region AS region
        ON user.region_id = region.id
    GROUP BY region.id;

In this case, Materialized will create a dataflow to maintain the results of this query, and that dataflow will live on until the view it’s maintaining is dropped.

Reading from a view 

Assuming you create the view listed above, named mat_view:

SELECT * FROM mat_view

In this case, Materialized simply returns the results of the dataflow you created to maintain the view.

Reading from sources 

SELECT region.id, sum(purchase.total)
FROM mysql_simple_purchase AS purchase
JOIN mysql_simple_user AS user
    ON purchase.user_id = user.id
JOIN mysql_simple_region AS region
    ON user.region_id = region.id
GROUP BY region.id;

In this case, Materialized will spin up the same dataflow as it did for creating a view, but it will tear down the dataflow once it’s returned its results to the client. If you regularly wanted to view the results of this query, you would want to create a view for it.