important

If you’re new here, Materialize is a distributed, cloud-native database with a stream processor in place of the query engine. This means it looks and feels a lot like any other database (and the RBAC implementation is a big part of that) but the streaming internals unlock performant, unlimited incrementally updated materialized views. Teams use Materialize to serve operational work like fraud detection, real-time personalization, feature-serving using the same familiar SQL workflows they use in Analytic Data Warehouses.

Why is it important?

For many companies, data is one of their most valuable resources. However, not all data is treated equally. Production data may be used to power a company’s most critical operations, while testing data may be used by developers to test new features. It’s important to allow broad access to test data to help developers quickly iterate and test. However, production data must be closely guarded. Similarly, different teams may own different subsets of data, and they may want to control visibility and access to other teams. One option is to keep each set of data in separate database. This is a clunky and potentially costly solution. Instead of managing one DBMS, your company now has to manage a single DBMS for each set of data. Additionally, you lose all the performance and cost benefits of co-locating data. Luckily for database and SQL lovers, the SQL standards committee thought about this issue when designing SQL.

What is RBAC?

Role Based Access Control (RBAC) is built directly into SQL and allows users to configure granular access control on database objects. A role is a database object that is associated with one or more users. Privileges on objects are granted to roles and allow that role to take certain actions associated with that object. For example, in order for a role to read from a table, that role must be granted the SELECT privilege on that table.

What does it look like in Materialize?

RBAC in Materialize has three important concepts to understand.

Roles

A role is a special database object that is associated with one or more users. A new role can be created via CREATE ROLE <name>. When a new user logs into Materialize for the first time, we automatically create a new role with the name of the user’s email.

Roles are also hierarchical, a role can be a member of another role and inherit all the other role’s privileges. For example, GRANT role2 TO role1 will add role1 as a member of role2 and cause role1 to inherit all of role2’s privileges (GRANT ROLE docs). Role inheritance allows users to grant privileges to many users at once with a single command.

Some roles have a special status called superuser that allows them to take any action they want without a privilege check. This status is not inherited by member roles. All organization admin roles are automatically given this status.

Privileges

Privileges are granted to roles and allow the role and all of its members to take certain actions in the database. For example, GRANT CREATE ON CLUSTER dev_cluster TO dev will allow the role dev, and all of its members, to create new objects in the cluster dev_cluster. GRANT SELECT on TABLE t TO "joe@materialize.com" will allow the role joe@materialize.com to read from table t.

Most privileges are associated with a specific object, like a table or cluster from our previous examples. Some privileges are associated with the entire system and allow a role to create top level objects. For example GRANT CREATEDB ON SYSTEM TO "mike@materialize.com" allows the role mike@materialize.com to create new databases. For a full list of privileges, and the complete syntax of GRANT, check out our documentation.

Ownership

Every object in the database, except for roles themselves, is owned by some role. The role that creates an object is automatically assigned as the object’s owner, though owners can be changed via ALTER...OWNER.... Only the owning role of an object, any role that’s a member of the owning role, or a superuser can grant privileges on that object and drop that object.

That’s it! With these three concepts, you can put the controls in place to keep your data safe and secure.

What are common use cases?

Let’s look at an example use case for RBAC. Let’s say I have two teams at my company, the Payments team which is responsible for processing customer’s purchases, and the Data Scientist team that is responsible for performing data analytics. First let’s create roles for each team.

sql
CREATE ROLE payments;
CREATE ROLE data_scientists;

Next, we want to add our employees to those roles.

sql
GRANT payments TO "joe@company.com";
GRANT data_scientists TO "aparna@company.com";
...

Then we’ll create separate database and clusters for each team. The databases allow the teams to logically separate their database objects while the clusters allow the teams to physically separate compute resources.

sql
CREATE DATABASE payments_db;
CREATE CLUSTER payments_cluster SIZE = 'large';

CREATE DATABASE data_scientists_db;
CREATE CLUSTER data_scientists_cluster SIZE = 'medium';

Currently, only the role that created these databases and clusters has privileges to interact with them. Usually that’s done by an admin, but can be done by any role with the CREATEDB and CREATECLUSTER system privileges. In order to let our teams use these object, let’s grant privileges to our team roles.

sql
GRANT ALL PRIVILEGES ON DATABASE payments_db TO payments;
GRANT ALL PRIVILEGES ON CLUSTER payments_cluster TO payments;

GRANT ALL PRIVILEGES ON DATABASE data_scientists_db TO data_scientists;
GRANT ALL PRIVILEGES ON CLUSTER data_scientists_cluster TO data_scientists;

If objects already existed in those databases, then you’ll have to manually grant privileges on those objects. For example,

sql
GRANT SELECT, INSERT ON ALL TABLES IN DATABASE payments_db TO payments;
GRANT SELECT, INSERT ON ALL TABLES IN DATABASE data_scientists_db TO data_scientists;

Finally, we probably want to change the default privileges for these databases.

sql
ALTER DEFAULT PRIVILEGES FOR ALL ROLES IN DATABASE payments_db GRANT SELECT, INSERT ON TABLES TO payments;
ALTER DEFAULT PRIVILEGES FOR ALL ROLES IN DATABASE data_scientists_db GRANT SELECT, INSERT ON TABLES TO data_scientists;

Default privileges automatically grant privileges on objects when a new object is created.

With these default privileges, everyone in the payments role can read and write to all relations that get created in the payments_db database. Similarly, everyone in the data_scientists role can read and write to all relations that get created in the data_scientists_db database. You are free to be more or less restrictive with your default privileges.

If someone on the data_scientist team tries to modify the payments teams data, they’ll get a permission error like this:

sql
INSERT INTO payments_db.public.purchase_history VALUES (42);
ERROR:  permission denied for TABLE "payments_db.public.purchase_history"

Suppose that the Data Scientist team had an interesting analysis that they wanted to do on the purchase_history table. Well you can grant them read only access to the table without allowing them to modify it.

sql
GRANT SELECT ON TABLE payments_db.public.purchase_history TO data_scientists;

Now the Data Scientist team can run data analytics to their heart’s delight, and the Payments team can rest easy knowing that the data scientists won’t accidentally modify the data.

sql
materialize=> SELECT AVG(price) FROM payments_db.public.purchase_history;
 avg
-----
 354
(1 row)

materialize=> INSERT INTO payments_db.public.purchase_history VALUES (42);
ERROR:  permission denied for TABLE "payments_db.public.purchase_history"

Check out Materialize Access Control Docs for a full guide on how to use RBAC!

Conclusion

If you’d like to see how Materialize works for your use case, you can get immediate access to our platform with a free 14-day trial here, or you can get in touch with our field engineering team to get a demo and talk through your use case here.

More Articles

Product

Operational Data Warehouse Overview | Materialize

We've built Materialize as a new kind of data warehouse, optimized to handle operational data work with the same familiar process from analytical warehouses.

Frank McSherry
Arjun Narayan
Jessica Laughlin

Sep 12, 2023

Conceptual Article

Lower Data Freshness Costs for Teams | Materialize

Materialize has a subtly different cost model that is a huge advantage for operational workloads that need fresh data.

Arjun Narayan
Andy Hattemer

Aug 29, 2023

Technical Article

Capturing Change Data Capture (CDC) Data

An illustration of the unexpectedly high downstream cost of clever optimizations to change data capture.

Frank McSherry

Aug 1, 2023

Try Materialize Free