Command line flags
materialized binary supports the following command line flags:
||N/A||Address of all coordinating Materialize nodes|
||Where data is persisted|
||N/A||NOP—prints binary’s list of command line flags|
||Disabled||Get more details here|
||Materialize node’s host and port|
||60s||The amount of historical detail to retain in arrangements|
||1000000||Maximum number of input records buffered before flushing immediately to disk.|
||0||This node’s ID when coordinating with other Materialize nodes|
||1||Number of coordinating Materialize nodes|
||N/A||Path to TLS certificate file|
||N/A||Path to TLS private key file|
||REQ||Dataflow worker threads|
||REQ||Dataflow worker threads|
||N/A||Print version and exit|
||N/A||Print version and additional build information, and exit|
materialized creates a directory where it persists metadata. By
default, this directory is called
mzdata and is situated in the current
working directory of the materialized process. Currently, only metadata is
mzdata. You can specify a different directory using the
--data-directory flag. Upon start,
materialized checks for an existing data
directory, and will reinstall source and view definitions from it if one is
materialized instance runs a specified number of timely dataflow worker
threads. Worker threads can only be specified at startup by setting the
--workers flag, and cannot be changed without shutting down
materialized and restarting. In the future, dynamically changing the number
of worker threads will be possible over distributed clusters, see
Changed in v0.4.0: Rename the
--threads flag to
How many worker threads should you run?
Adding worker threads allows Materialize to handle more throughput. Reducing worker threads consumes fewer resources, and reduces tail latencies.
In general, you should use the fewest number of worker threads that can handle your peak throughputs. This is also the most resource efficient.
You should never run Materialize in a configuration greater than
n is the number of physical cores. Note that major cloud providers
list the number of hyperthreaded cores (or virtual CPUs). Divide this number
by two to get the number of physical cores available. The reasoning is simple:
Timely Dataflow is very computationally efficient and typically uses all
available computational resources. Under high throuput, you should see each
worker pinning a core at 100% CPU, with no headroom for hyperthreading. One
additional core is required for metadata management and coordination. Timely
workers that have to fight for physical resources will only block each other.
r5d.4xlarge instance has 16 VCPUs, or 8 physical cores. The
recommended worker setting on this VM is
Horizontally scaled clusters
--processes controls the total number of nodes in a horizontally-scaled
Materialize cluster. The IP addresses of each node should be specified in a
file, one per line, which is specified by the
When each node is started, it must additionally be told which
--process it is,
processes - 1.
You should not attempt running a horizontally-scaled Materialize cluster until
you have maxed-out vertical-scaling. Multi-node clusters are not particulary
x1.32xlarge instance on AWS has 128 VCPUs, and will be superior
in every way (reliability, cost, ease-of-use) to a multi-node Materialize
cluster with the same total number of VCPUs. It is our performance goal that
Materialize under that configuration be able to handle every conceivable
streaming workload that you may wish to throw at it.
materialized binds to
0.0.0.0:6875. This means that Materialize
will accept any incoming SQL connection to port 6875 from anywhere. It is the
responsibility of the network firewall to limit incoming connections. If you
wish to configure
materialized to only listen to, e.g. localhost connections,
you can set
localhost:6875. You can also use this to change
the port that Materialize listens on from the default
--logical-compaction-window option specifies the duration of time for
which Materialize is required to maintain full historical detail in its
arrangements. Note that compaction happens
lazily, so Materialize may retain more historical detail than requested, but it
will never retain less.
The value of the option is a duration string like
10ms (10 milliseconds) or
1min 30s (1 minute, 30 seconds). The special value
off indicates disables
The logical compaction window ends at the current time and extends backwards in time for the configured duration. The default window is 60 seconds.
See the Deployment section for guidance on tuning the compaction window.
Materialize can use Transport Layer Security (TLS) to encrypt traffic between
SQL and HTTP clients and the
To enable TLS, you will need to supply two files, one containing a TLS
certificate and one containing the corresponding private key. Point
materialized at these files using the
$ materialized -w1 --tls-cert=server.crt --tls-key=server.key
When TLS is enabled, Materialize serves both unencrypted and encrypted traffic
over the same TCP port, as specified by
web UI will be served over HTTPS in addition to HTTP. Incoming SQL connections
can negotiate TLS encryption at the client’s option; consult your SQL client’s
documentation for details.
It is not currently possible to configure Materialize to reject unencrypted connections.
Materialize statically links against a vendored copy of OpenSSL. It does not
use any SSL library that may be provided by your system. To see the version of
OpenSSL used by a particular
materialized binary, inquire with the
$ materialize -vv
materialized v0.2.3-dev (c62c988e8167875b92122719eee5709cf81cdac4) OpenSSL 1.1.1g 21 Apr 2020 librdkafka v1.4.2
Materialize configures OpenSSL according to Mozilla’s Modern compatibility level, which requires TLS v1.3 and modern cipher suites. Using weaker cipher suites or older TLS protocol versions is not supported.
Generating TLS certificates
You can generate a self-signed certificate for development use with the
openssl command-line tool:
$ openssl req -new -x509 -days 365 -nodes -text \ -out server.crt -keyout server.key -subj "/CN=<SERVER-HOSTNAME>"
Production deployments typically should not use self-signed certificates. Acquire a certificate from a proper certificate authority (CA) instead.
New in v0.4.0.
Materialize offers access to experimental features through the
flag. Unlike most features in Materialize, experimental features’ syntax and/or
semantics can shift at any time, and there is no guarantee that future
versions of Materialize will be interoperable with the experimental features.
Using experimental mode means that you are likely to lose access to all of your sources and views within Materialize and will have to recreate them and re-ingest all of your data.
Because of this volatility:
- You can only start new nodes in experimental mode.
- Nodes started in experimental mode must always be started in experimental mode.
We recommend only using experimental mode to explore Materialize, i.e. absolutely never in production. If your explorations yield interesting results or things you’d like to see changed, let us know on GitHub.
Disabling experimental mode
--persistence-max-pending-records specifies the number of input messages
Materialize buffers in memory before flushing them all to disk when using
persisted sources. The default value is 1000000 messages. Note
that Materialize will also flush buffered records every 10 minutes as well. See
the Deployment section for more guidance on how to tune this