We’re proud to announce that we’ve just released Materialize version 0.4. Here is a quick overview of the main features.
What’s changed in Materialize 0.4
Materialize 0.4 includes a number of stability improvements, which we’ve identified through customer feedback, as well as improving our own unit tests. We’ve built a chaos testing harness, which has helped us identify a number of stability improvements. Lastly, we’ve devoted time to polishing our sinks, aiming to make them as robust and feature-rich as our sources.
Releasing mz-avro as open source We’re contributing back to the open-source community by releasing mz-avro, a Rust Avro encoder/decoder. We’ve rewritten the existing avro-rs library to significantly improve performance, correctness, and compliance with the Avro standard. Materialize is now able to interpret many more real-world Avro schemas, including schemas that contain nested records. mz-avro is being released under the Apache 2.0 license.
Support for Confluent Cloud To support Confluent Cloud, we’ve added support for SASL PLAIN authentication. Confluent Cloud is a hosted Kafka service.
CDC format updates Change data capture is a commonly used software pattern used to listen to updates to database changes, usually via a stream of updates. Naturally, many of our customers use CDC tools (like Debezium) as input sources to Materialize when they wish to generate a real-time view of their data! (To quickly connect your Postgres or MySQL database without needing to run Kafka or having to configure Debezium, we’ve released tb, which embeds Debezium with most settings pre-configured.)
To aid our CDC customers, we’ve documented the Materialize CDC format, which resembles Debezium’s. We’re also continuing to work towards improving our CDC schema so that Materialize’s CDC output (from a sink) can also be re-ingested as an input (as a source).
Time travel A common objective with both streams and database tables is to run queries for arbitrary, historical moments in time. Materialize is now capable of creating sinks for precise time travel using
CREATE SINK ... AS OF. We’ve also added
WITH SNAPSHOT and
WITHOUT SNAPSHOT to allow more fine-grained control over whether sinks should include the full query result or only changes at creation time. We’ve also added the same time travel support to our
TAIL SQL statement.
Easier to get started For those new to Materialize, we’ve cleaned up several of our docs, adding details on how to run Materialize in containerized environments, observability, and how to quickly connect Materialize to an existing database without Kafka. We’ve also added a web-based interactive demo that allows you to try out Materialize out of the box, without any installation.
What’s Coming in 0.5
Performance benchmarks and improvements
Materialize is already being used by customers with workloads in the terabytes and throughput in hundreds of thousands per second. To better quantify the user experience (as well as hold ourselves to a higher standard), we will be publishing comprehensive, reproducible benchmarks of Materialize. The hope is that this will also assist in resource sizing estimates for production deployments. The benchmarks we intend to publish will grade Materialize’s performance on a number of dimensions, including throughput, latency, scalability, and query complexity.
As we aggressively identify bottlenecks, we intend to solve them, then move on to the next bottleneck. While some bottlenecks will require larger refactors, this first phase will identify what refactors need to be done, and set the stage for larger, subsequent tasks.
We’ve begun to benchmark Materialize against several scenarios, including CH-Benchmark and the Yahoo Streaming Benchmark. Once we have these baseline values, we will be able to continue to evolve Materialize’s performance. We’re not just focused on synthetic performance, however. We’re continuing to improve ingest and query performance based on production customer feedback, with throughput targets of hundreds of thousands to millions of records per second, with latency in the tens of milliseconds.
Continuing to evolve source data persistence
This first release provides for repeatability for materialized views and avoids having to re-read source data across restarts. This will be a multi-release process, but we’re excited to get user feedback on this feature in 0.4!