BigQuery supports dynamic data masking at the column level, which enables teams to use granular data without creating data security problems. This is important because a dataset is likely to contain users’ PII. If we do not handle this, analysts querying the fine-grained data can cause security issues. For example, users’ email addresses or social security numbers are confidential, i.e., they must not be made public under any circumstances. On the other hand, analysts can create the best insights from granular data, so it is important for the entire organization to provide it for them. One way to preserve confidentiality is to hide PII values in specific columns.
BigQuery is a fully managed, cloud-native data warehouse that enables fast SQL queries using the processing power of Google’s infrastructure. It also offers a range of security and data masking features to help organizations protect their sensitive data. Here we show you step-by-step how to implement PII protection using BigQuery’s inbuilt features.
In BigQuery, you can create policy tags to define access to your data. Consider what kinds of data your organization has and order them into a tree structure. Then consider which team needs access to which data class. For example, one group needs access to business-sensitive data, such as revenue and customer history. Another group needs access to PII like phone numbers and addresses.
When a user queries data, BigQuery checks the policy tags of the selected columns. If a selected column is tagged with a data masking rule and the user has permission to access the masked or the original data, then BigQuery executes the query. Otherwise, the user receives an “Access Denied” error message.
For each policy tag, we can specify the masking rule. The masking rule defines if the values are changed to nulls, hashes, or default masking values. Also, we can specify in the ‘Principal’ field which users, groups, or service accounts are eligible to query the masked data. If a user is not added here, they will not be able to query the tables to which the policy tag is added.
In the IAM & Admin menu, we can grant roles to the principals. There are two important roles allowing a query of masked data:
Dataplex, a Google Cloud Platform service, helps users unify distributed data and automate data management and governance to power analytics at scale. In Dataplex, we can search for BigQuery tables, then in the ‘Schema and column tags’, we can add any preset policy tag to any column by clicking the + button.
As a result, principals with the Masked Reader role will see null values in columns that have a policy tag.
The data table and the taxonomy must be within the same project. You can find further information about dynamic, column-level data masking in BigQuery in the related GCP documentation: