As we prepare for our workshop “A Big Data Adventure on the Google Cloud Platform” at Crunch Data Engineering and Analytics Conference, we have started to collect some tips and tricks when it comes to using Google’s Big Data platform. This first collection of advice shows how to control your costs when using BigQuery (BQ), discusses the kinds of built-in tools you have access to, and explains the related best practices. In the second blog post, we show how you can boost your BigQuery performance. Read this blog post to find out how to take control of your Bigquery costs.
The BigQuery pricing is usage-based, you pay for the amount of data you store, query, and insert by streaming. This seems fair, although it has some consequences. When you have practically unlimited resources — as in the case of BigQuery — this pricing policy becomes a double-edged sword: it is easy to spend a lot if you are not cautious enough. When it comes to budget planning, you should know how much data you will store and process in the future and this can make it challenging to estimate your costs. Fortunately, there are some tools and best practices that can help you to control your costs.
Probably the most difficult is to plan the costs if you’ve never actually used BQ. Google provides a pricing calculator tool that helps in these situations. It estimates your future costs based on the parameters we’ve already mentioned: storage, streaming inserts, queries. Of course, to get a result, you will still have to estimate the size of the datasets involved.
BQ provides a couple of built-in tools to continuously control your spending and stop you going over budget.
Check the amount of data BQ will process during the query before actually running that query. Simply click on the green exclamation mark (Validator) in the right bottom of the window. If you also add the BQ Mate extension to your Chrome browser, you will see how much the query will cost, in USD.
Set the project-level soft limit by requesting a billing alert. When you exceed the threshold you have set for the current month, the billing administrators will receive an email notification. Set the project-level hard limit by maximizing the number of bytes processed per day within the project. If you enable the BQ custom quotas feature, you can specify this limit in 10 TB increments. To activate this feature or to modify the value, submit the BQ Custom Quota Request form. When you exceed the quota, queries will return errors.
Set the user-level hard limit within a project by maximizing the number of bytes processed per day by a given user. The method is the same as in case of the project-level hard limit, except that the error message is different when you exceed the quota.
If you have multiple projects, and you want to monitor the overall budget usage, set billing alerts for your entire billing account. The process is the same as point 2a.
Not really. Beyond these built-in features, there are several best practices that can help to reduce your costs.
Set the data expiration date on datasets. After a given time, all the tables in the dataset will be deleted so you won’t have to pay for storage. This is especially useful if you are in an experimentation phase with BQ. Another benefit of this feature is that you can prevent the overgrowth of tables and datasets in your projects.