BigQuery column-level data masking: How to maintain security and granular control

hero

Fanni Bolyki | 2023-03-08

4 Minutes Read

BigQuery supports dynamic data masking at the column level, which enables teams to use granular data without creating data security problems. This is important because a dataset is likely to contain users’ PII. If we do not handle this, analysts querying the fine-grained data can cause security issues. For example, users’ email addresses or social security numbers are confidential, i.e., they must not be made public under any circumstances. On the other hand, analysts can create the best insights from granular data, so it is important for the entire organization to provide it for them. One way to preserve confidentiality is to hide PII values in specific columns. 

BigQuery is a fully managed, cloud-native data warehouse that enables fast SQL queries using the processing power of Google’s infrastructure. It also offers a range of security and data masking features to help organizations protect their sensitive data. Here we show you step-by-step how to implement PII protection using BigQuery’s inbuilt features.

Create the data policy within BigQuery Data Policies

In BigQuery, you can create policy tags to define access to your data. Consider what kinds of data your organization has and order them into a tree structure. Then consider which team needs access to which data class. For example, one group needs access to business-sensitive data, such as revenue and customer history. Another group needs access to PII like phone numbers and addresses. 

When a user queries data, BigQuery checks the policy tags of the selected columns. If a selected column is tagged with a data masking rule and the user has permission to access the masked or the original data, then BigQuery executes the query. Otherwise, the user receives an “Access Denied” error message.

Create the Data Masking Rule with BigQuery

For each policy tag, we can specify the masking rule. The masking rule defines if the values are changed to nulls, hashes, or default masking values. Also, we can specify in the ‘Principal’ field which users, groups, or service accounts are eligible to query the masked data. If a user is not added here, they will not be able to query the tables to which the policy tag is added. 

Grant permission to the principals

In the IAM & Admin menu, we can grant roles to the principals. There are two important roles allowing a query of masked data:

  1. Masked Reader: The principals can query the table, but will see masked data in the tagged columns. A common use case is when analysts are assigned the Masked Reader role. In this case, we recommend hashes as a data masking rule to allow analysts to demonstrate the change in the PII field without accessing the content of the field.
  2. Fine-Grained Reader: The principals can query the table and read the data. Referring back to the previous use case, this role is not supposed to be given to analysts, but to a group that needs the tagged information, i.e. calling the phone number, sending an email, creating an invoice for the given address, etc.

Apply the data policy to the columns in Dataplex

Dataplex, a Google Cloud Platform service, helps users unify distributed data and automate data management and governance to power analytics at scale. In Dataplex, we can search for BigQuery tables, then in the ‘Schema and column tags’, we can add any preset policy tag to any column by clicking the + button.

masked reader role

As a result, principals with the Masked Reader role will see null values in columns that have a policy tag. 


query results.png

The data table and the taxonomy must be within the same project. You can find further information about dynamic, column-level data masking in BigQuery in the related GCP documentation:

Content

Share It

facebooktwitterlinkedin

Doctusoft Office Opened in Singapore

clock

July 13, 2018

< 1 Minute Read

In 2018 we announced a brand-new Doctusoft office opening officially in Singapore.

AutoML: An Introduction To Get You Started

clock

January 29, 2020

4 Minutes Read

AutoML is attracting more and more attention. What is AutoML and what is it used for? Learn more in our article.

5 Good Reasons to Move to a Cloud-based Data Warehouse

clock

November 5, 2018

3 Minutes Read

Now, let me walk you through the benefits of cloud-based data warehouses, one by one.

Ready for the future? Let’s talk!

Reach out, and let’s take your business to the next level.

By clicking submit below, you consent to allow Aliz.ai to store and process the personal information submitted above and share information about our products and services, as well as other content that may be of interest to you. For more information, please review our Privacy Policy. You may unsubscribe at any time. Your data will not be passed on to third parties.

I agree to receive other communications from Aliz.ai.

badge

New opportunities with cloud solutions!

Aliz is a proud Google Cloud Partner with specializations in Infrastructure, Data Analytics, Cloud Migration and Machine Learning. We deliver data analytics, machine learning, and infrastructure solutions, off the shelf, or custom-built on GCP using an agile, holistic approach.

logo

© Copyright 2023 Aliz Tech Kft.

logologologologologo