Materialize Logo

Get started

To help you get started with Materialize, we’ll:

Try the demo in your browser →

Prerequisites

To complete this demo, you need:

We also highly recommend checking out What is Materialize?

Install, run, connect

  1. Install the materialized binary using these instructions.

  2. Run the materialized binary. For example, if you installed it in your $PATH:

    materialized -w 1
    

    This starts the daemon listening on port 6875 using 1 worker.

  3. Connect to materialized through your Materialize CLI, e.g.:

    psql -h localhost -p 6875 materialize
    

Explore Materialize’s API

Materialize offers ANSI Standard SQL, but is not simply a relational database. Instead of tables of data, you typically connect Materialize to external sources of data (called sources), and then create materialized views of the data that Materialize sees from those sources.

To get started, though, we’ll begin with a simple version that doesn’t require connecting to an external data source.

  1. From your Materialize CLI, create a materialized view that contains actual data we can work with.

    CREATE MATERIALIZED VIEW pseudo_source (key, value) AS
        VALUES ('a', 1), ('a', 2), ('a', 3), ('a', 4),
        ('b', 5), ('c', 6), ('c', 7);
    

    You’ll notice that we end up entering data into Materialize by creating a materialized view from some other data, rather than the typical INSERT operation. This is how one interacts with Materialize. In most cases, this data would have come from an external source and get fed into Materialize from a file or a stream.

  2. With data in a materialized view, we can perform arbitrary SELECT statements on the data.

    Let’s start by viewing all of the data:

    SELECT * FROM pseudo_source;
    
     key | value
    -----+-------
     a   |     1
     a   |     2
     a   |     3
     a   |     4
     b   |     5
     c   |     6
     c   |     7
    
  3. Determine the sum of the values for each key:

    SELECT key, sum(value) FROM pseudo_source GROUP BY key;
    
     key | sum
    -----+-----
     a   |  10
     b   |   5
     c   |  13
    

    We can actually then save this query as its own materialized view:

    CREATE MATERIALIZED VIEW key_sums AS
        SELECT key, sum(value) FROM pseudo_source GROUP BY key;
    
  4. Determine the sum of all keys’ sums:

    SELECT sum(sum) FROM key_sums;
    
  5. We can also perform complex operations like JOINs. Given the simplicity of our data, the JOIN clauses themselves aren’t very exciting, but Materialize offers support for a full range of arbitrarily complex JOINs.

    CREATE MATERIALIZED VIEW lhs (key, value) AS
        VALUES ('x', 'a'), ('y', 'b'), ('z', 'c');
    
    SELECT lhs.key, sum(rhs.value)
    FROM lhs
    JOIN pseudo_source AS rhs
    ON lhs.value = rhs.key
    GROUP BY lhs.key;
    

Of course, these are trivial examples, but hope begin to illustrate some of Materialize’s potential.

Create a real-time stream

Materialize is built to handle streams of data, and provide incredibly low-latency answers to queries over that data. To show off that capability, in this section we’ll set up a real-time stream, and then see how Materialize lets you query it.

  1. We’ll set up a stream of Wikipedia’s recent changes, and simply write all data that we see to a file.

    From your shell, run:

    while true; do
      curl --max-time 9999999 -N https://stream.wikimedia.org/v2/stream/recentchange >> wikirecent
    done
    

    Note the absolute path of the location where you write wikirecent, which we’ll need in the next step.

  2. From within the CLI, create a source from the wikirecent file:

    CREATE SOURCE wikirecent
    FROM FILE '[path to wikirecent]' WITH (tail = true)
    FORMAT REGEX '^data: (?P<data>.*)';
    

    This source takes the lines from the stream, finds those that begins with data:, and then captures the rest of the line in a column called data

    You can see the columns that get generated for this source:

    SHOW COLUMNS FROM wikirecent;
    
  3. Because this stream comes in as JSON, we’ll need to normalize the data to perform aggregations on it. Materialize offers the ability to do this easily using our built-in jsonb functions.

    CREATE MATERIALIZED VIEW recentchanges AS
        SELECT
            val->>'$schema' AS r_schema,
            (val->'bot')::bool AS bot,
            val->>'comment' AS comment,
            (val->'id')::float::int AS id,
            (val->'length'->'new')::float::int AS length_new,
            (val->'length'->'old')::float::int AS length_old,
            val->'meta'->>'uri' AS meta_uri,
            val->'meta'->>'id' as meta_id,
            (val->'minor')::bool AS minor,
            (val->'namespace')::float AS namespace,
            val->>'parsedcomment' AS parsedcomment,
            (val->'revision'->'new')::float::int AS revision_new,
            (val->'revision'->'old')::float::int AS revision_old,
            val->>'server_name' AS server_name,
            (val->'server_script_path')::text AS server_script_path,
            val->>'server_url' AS server_url,
            (val->'timestamp')::float AS r_ts,
            val->>'title' AS title,
            val->>'type' AS type,
            val->>'user' AS user,
            val->>'wiki' AS wiki
        FROM (SELECT data::jsonb AS val FROM wikirecent);
    
  4. From here we can start building our aggregations. The simplest place to start is simply counting the number of items we’ve seen:

    CREATE MATERIALIZED VIEW counter AS
        SELECT COUNT(*) FROM recentchanges;
    
  5. However, we can also see more interesting things from our stream. For instance, who are making the most changes to Wikipedia?

    CREATE MATERIALIZED VIEW useredits AS
        SELECT user, count(*) FROM recentchanges GROUP BY user;
    
    SELECT * FROM useredits ORDER BY count DESC;
    
  6. If this is a factoid we often want to know, we could also create a view of just the top 10 editors we’ve seen.

    CREATE MATERIALIZED VIEW top10 AS
        SELECT * FROM useredits ORDER BY count DESC LIMIT 10;
    

    We can then quickly get the answer to who the top 10 editors are:

    SELECT * FROM top10 ORDER BY count DESC;
    

Naturally, there are many interesting views of this data. If you’re interested in continuing to explore it, you can check out the stream’s documentation from Wikipedia.

Once you’re done, don’t forget to stop curl and rm wikirecent.

Next, see how Materialize can work as an entire microservices →