Materialize is an incremental view maintenance engine, one which takes your SQL queries expressed as views and continually maintains them as your data change. Surely there are a lot of ways one could do this, ranging from the very naïve (just recompute from scratch) to the more sophisticated end of the spectrum (what we do). […]
Last week concluded up my first week at Materialize, with Friday being my first Skunkworks Friday. Skunkworks Friday is a Materialize sponsored day of the week to spend on personal development and learning. Given that it was my first week, I challenged myself to build something using Materialize. Having spent most of my career working […]
Streaming systems consume inputs and produce outputs asyncronously: the output of a system at any moment may not reflect all of the inputs seen so far. These systems provide various guarantees about how their outputs relate to their input. Among the weaker (but not unpopular) guarantees is eventual consistency. Informally, eventual consistency means if the […]
I have some thoughts on the use of Rust for data-intensive computations. Specifically, I’ve found several of Rust’s key idioms line up very well with the performance and correctness needs of data-intensive computing. If you want a tl;dr for the post: I’ve built multiple high-performance, distributed data processing platforms in Rust, and I never learned […]
How do you build a streaming database from scratch?
This is an edited transcript and video of a talk that I gave at Carnegie Mellon’s Database Group Seminar on June 1st, 2020, hosted by Andy Pavlo. You can watch it, or read along! Introduction and Background First things first, you can download Materialize right now! For our agenda today, I’m first going to talk […]
Self-compacting dataflows Those of you familiar with dataflow processing are likely also familiar with the constant attendant anxiety: won’t my constant stream of input data accumulate and eventually overwhelm my system? That’s a great worry! In many cases tools like differential dataflow work hard to maintain a compact representation for your data. At the same […]
In-order reliable message delivery is not enough. Showing views over streams of data requires thinking through additional consistency semantics to deliver correct results.
“Upserts” are a common way to express streams of changing data, especially in relational settings with primary keys. However, they aren’t the best format for working with incremental computation. We’re about to learn why that is, how we deal with this in differential dataflow and Materialize, and what doors this opens up! This post is […]