Materialize Logo

CREATE SOURCE: JSON over Kinesis

CREATE SOURCE connects Materialize to some data source, and lets you interact with its data as if it were in a SQL table.

This document details how to connect Materialize to JSON-formatted Kinesis streams.

WARNING! Kinesis support is undergoing active development, and is in Alpha status. If you run into any issues with it, please let us know with a GitHub issue.

Conceptual framework

Sources represent connections to resources outside Materialize that it can read data from. For more information, see API Components: Sources.

Syntax

CREATE MATERIALIZED SOURCE IF NOT EXISTS src_name ( col_name , ) FROM KINESIS ARN arn WITH ( field = val , ) FORMAT BYTES
Field Use
MATERIALIZED Materializes the source’s data, which retains all data in memory and makes sources directly selectable. For more information, see Materialized source details.
src_name The name for the source, which is used as its table name within SQL.
col_name Override default column name with the provided identifier. If used, a col_name must be provided for each column in the created source.
KINESIS ARN arn The AWS ARN of the Kinesis Data Stream.
WITH ( option_list ) Options affecting source creation. For more detail, see WITH options.
FORMAT BYTES Materialize receives data as raw bytes, but you can easily transform the data to JSON. For more details, see Byte format details (JSON).
ENVELOPE NONE (Default) Use an append-only envelope. This means that records will only be appended and cannot be updated or deleted.

WITH options

The following options are valid within the WITH clause.

Field Value type Description
access_key_id text A valid access key ID to the Kinesis stream.
secret_access_key text A valid secret access key to the Kinesis stream.
token text The session token associated with the credentials, if the credentials are temporary

For details about the IAM account whose details you provide, see Kinesis source details.

If you do not explicitly provide AWS credentials, Materialize will attempt to fetch credentials by reading standard environment variables, filesystem paths, and, if running on AWS ECS or EC2, the container or instance profile. Consult the Rusoto credentials documentation for details.

Credentials fetched from a container or instance profile expire on a fixed schedule. Materialize will attempt to refresh the credentials automatically before they expire, but the source will become inoperable if the refresh operation fails.

Details

Materialized source details

Materializing a source keeps data it receives in an in-memory index, the presence of which makes the source directly queryable. In contrast, non-materialized sources cannot process queries directly; to access the data the source receives, you need to create materialized views that SELECT from the source.

For a mental model, materializing the source is approximately equivalent to creating a non-materialized source, and then creating a materialized view from all of the source’s columns:

CREATE SOURCE src ...;
CREATE MATERIALIZED VIEW src_view AS SELECT * FROM src;

The actual implementation of materialized sources differs, though, by letting you refer to the source’s name directly in queries.

For more details about the impact of materializing sources (and implicitly creating an index), see CREATE INDEX: Details — Memory footprint.

Kinesis source details

Byte format details (JSON)

Materialize receives all data from Kinesis streams as bytes, which you must convert to text, and then to jsonb.

Here is an example of using a Kinesis source, converting the streams bytes into jsonb data:

CREATE MATERIALIZED VIEW jsonified_kinesis_source AS
SELECT CAST(data AS JSONB) AS data
FROM (
    SELECT CONVERT_FROM(data, 'utf8') AS data
    FROM kinesis_source
)

Append-only envelope

Append-only envelope means that all records received by the source is treated as an insert. This is Materialize’s default envelope (i.e. if no envelope is specified), and can be specified with ENVELOPE NONE.

Examples

CREATE SOURCE kinesis_source
FROM KINESIS ARN ... WITH (
    access_key_id = ...,
    secret_access_key = ...
)
FORMAT BYTES;

This creates a source that…

To use this data in views, you can decode its bytes into jsonb. For example:

CREATE MATERIALIZED VIEW jsonified_kinesis_source AS
  SELECT CAST(data AS jsonb) AS data
  FROM (
      SELECT convert_from(data, 'utf8') AS data
      FROM kinesis_source
  )