Fanni Bolyki | 2023-03-08
4 Minutes Read
BigQuery supports dynamic data masking at the column level, which enables teams to use granular data without creating data security problems. This is important because a dataset is likely to contain users’ PII. If we do not handle this, analysts querying the fine-grained data can cause security issues. For example, users’ email addresses or social security numbers are confidential, i.e., they must not be made public under any circumstances. On the other hand, analysts can create the best insights from granular data, so it is important for the entire organization to provide it for them. One way to preserve confidentiality is to hide PII values in specific columns.
BigQuery is a fully managed, cloud-native data warehouse that enables fast SQL queries using the processing power of Google’s infrastructure. It also offers a range of security and data masking features to help organizations protect their sensitive data. Here we show you step-by-step how to implement PII protection using BigQuery’s inbuilt features.
In BigQuery, you can create policy tags to define access to your data. Consider what kinds of data your organization has and order them into a tree structure. Then consider which team needs access to which data class. For example, one group needs access to business-sensitive data, such as revenue and customer history. Another group needs access to PII like phone numbers and addresses.
When a user queries data, BigQuery checks the policy tags of the selected columns. If a selected column is tagged with a data masking rule and the user has permission to access the masked or the original data, then BigQuery executes the query. Otherwise, the user receives an “Access Denied” error message.
For each policy tag, we can specify the masking rule. The masking rule defines if the values are changed to nulls, hashes, or default masking values. Also, we can specify in the ‘Principal’ field which users, groups, or service accounts are eligible to query the masked data. If a user is not added here, they will not be able to query the tables to which the policy tag is added.
In the IAM & Admin menu, we can grant roles to the principals. There are two important roles allowing a query of masked data:
Dataplex, a Google Cloud Platform service, helps users unify distributed data and automate data management and governance to power analytics at scale. In Dataplex, we can search for BigQuery tables, then in the ‘Schema and column tags’, we can add any preset policy tag to any column by clicking the + button.
As a result, principals with the Masked Reader role will see null values in columns that have a policy tag.
The data table and the taxonomy must be within the same project. You can find further information about dynamic, column-level data masking in BigQuery in the related GCP documentation:
Doctusoft Office Opened in Singapore
July 13, 2018
< 1 Minute Read
In 2018 we announced a brand-new Doctusoft office opening officially in Singapore.
AutoML: An Introduction To Get You Started
January 29, 2020
4 Minutes Read
AutoML is attracting more and more attention. What is AutoML and what is it used for? Learn more in our article.
5 Good Reasons to Move to a Cloud-based Data Warehouse
November 5, 2018
3 Minutes Read
Now, let me walk you through the benefits of cloud-based data warehouses, one by one.
Reach out, and let’s take your business to the next level.