Google is heavily investing in making the process of migrating on-premises data warehouses to Google BigQuery as streamlined, fast, and cost-effective as possible with the Google BigQuery Migration Service. In this blog post, we introduce you to the Batch SQL Translator tool. We describe the steps to translate Teradata SQL to BigQuery’s Standard SQL and provide code samples to achieve the translation from the Cloud Console and programmatically as well.
With the increasing adoption of cloud services, migrating on-premises data warehouses to the public clouds is a hot topic. The Forrester Wave™: Cloud Data Warehouse, Q1 2021 report named Google BigQuery, GCP’s cloud data warehouse service, as the leader due to its strong current offering and market strategy among the 13 most significant solutions in this space. Google BigQuery is a natural choice when it comes to migrating existing solutions since BigQuery is a fully managed, serverless, highly scalable, and cost-effective product.
Google is heavily investing in ensuring that customers have the best experience while migrating their existing data infrastructure to Google BigQuery. Google Cloud features multiple services to ease the migration from on-premises to cloud like BigQuery Data Transfer Service, a solution to automate data movements, and Google BigQuery Migration Service, a comprehensive set of free-to-use tools to streamline the migration of data warehouses from start to finish. CompilerWorks, a company developing products to automatically convert and optimize legacy code (including SQL) to run in the cloud, was acquired by Google in 2021.
We at Aliz have helped a number of organizations successfully migrate their data wealth to Google BigQuery in a way that’s fast, efficient, and cost-effective while providing agility and value to the business. Teradata is often the traditional, on-premises database that GCP customers want to migrate to the cloud. In cooperation with KPMG, Aliz has supported Otto Group’s Bonprix to migrate their on-premises Teradata setup to Google BigQuery. The migration completed on time, on budget, and proved to be 40%-50% more cost-effective than the on-premises data warehouse. GCP and Google BigQuery enabled Bonprix’s IT team to shift focus from infrastructure management to creating business value. To read the complete case study, refer to https://cloud.google.com/customers/bonprix.
The process of translating the existing SQL statements of an on-premises system into the SQL dialect of the destination cloud data warehouse is a crucial step during a cloud migration journey, but this step can be especially complex and time consuming. Reducing the effort needed to achieve this is important since it can minimize the number of developer hours needed to master the new dialect and manually rewrite SQL statements, thereby greatly speeding up the migration process. Tools that automate the translation are crucial during the migration process.
Google BigQuery’s batch SQL translator, part of Google BigQuery Migration Service, supports translating from Teradata SQL and Basic Data Query (BTEQ) to BigQuery’s Standard SQL dialect. The free-to-use tool automatically takes a collection of source files from Google Cloud Storage and outputs the translated BigQuery Standard SQL statements with a detailed summary of warnings and errors encountered during the translation process. Optional settings provide a way to configure the SQL translation in detail to achieve the desired results. Batch SQL translator provides client libraries for multiple programming languages including Java, Go, and Python. Google BigQuery Migration also offers an interactive SQL translator that is well integrated into the Cloud Console and the BigQuery GUI. Teradata concepts that do not exist in the same form in BigQuery are intelligently converted to their matching BigQuery functionality. This includes converting Teradata index definitions into BigQuery clustering fields.
To demonstrate the use of Google BigQuery’s batch SQL translator with the Teradata SQL dialect, let’s consider the following simple data model representing the orders, customers, products, and currencies of a fictional e-commerce firm.
The corresponding Teradata SQL (data definition language) is as follows:
To use the interactive SQL translator embedded in the BigQuery GUI, click on the “Compose New Query” and click on “Enable SQL translation” under “More”.
After pasting the Teradata SQL into the left-hand side of the screen (under “Translating from: Teradata”), click on the button “Translate”.
Standard SQL appears on the right side of the screen under “BigQuery Standard SQL”. Warnings are clearly indicated in comments around the statements. In case of error in the Teradata SQL, erroneous parts will be highlighted on the left-hand side.
Google BigQuery’s batch SQL translator allows for programmatic translation of even larger volumes of Teradata SQL files to BigQuery Standard SQL in a timely manner with an easy-to-use API. By way of example, I will instruct batch SQL translator to translate a Teradata SQL file (containing the sample data warehouse SQL shared previously) stored on Google Storage using its Python client library. Note that Google provides client libraries for other languages such as Go and Java as well. This sample code is shared as a GitHub accessible at Github.
Let’s create a new Google Storage bucket and upload the sample e-commerce data warehouse’s Teradata SQL in a file to path “teradata_sql/ddl/ddl.sql”.
The Python client library needs to be installed to run this sample script.
Importing the following dependencies is necessary.
The contents of file “translation_pb2.py” were translated from Google-provided protobuf message formats using the protocol buffer compiler “protoc”.
First, we need to define the migration task details that contain the following properties:
If translating DML (data manipulation language) statements, an optional property “schema_path” can be passed where the DDL (data definition language) accompanying the DQL or DML (data query/manipulation language) can be defined for better translation.
Second, the migration workflow needs to be defined. Such a migration workflow may consist of multiple migration tasks, allowing for combining a number of different steps.
Finally, the freshly created workflow needs to be started. This is a non-blocking operation; the workflow will start running after calling the Google BigQuery Migration Service object’s “start_migration_workflow” method, but it will return with a unique identifier of the started workflow before it completes.
The status of the started workflow can be tracked using the method “get_migration_workflow”.
As soon as the workflow’s state is “COMPLETED”, the freshly translated Standard SQL will appear in the Google Storage bucket’s respective path.
The file “ddl.sql” containing the translated Standard SQL (which has similar contents as the ones seen in the BigQuery UI’s interactive SQL translator) is accompanied by a CSV file named “ddl_errors.csv”, which collects all the detected errors and warnings with clear descriptions of the detected issues and references to specific parts of the original Teradata SQL file.
Both methods result in the following BigQuery Standard SQL translated from the Teradata SQL provided. As mentioned previously, the output clearly indicates the choices the tool made during the translation process like converting Teradata primary indexes (that do not exist in BigQuery) to clustering fields in the target data warehouse. Possible errors are also displayed in the output in the form of comments. Importantly, the translator tools can separate the individual SQL statements; failed statements are indicated as comments, while the successfully translated ones are included in their translated form, ready to be executed.
The BigQuery interactive and batch SQL translators, both pre-GA (General Availability) offerings, come with several limitations.
Quotas apply while using the BigQuery Migration API. The individual SQL file sizes, the input, and the schema file sizes per translation job need to be in the range of 30-100 MB per job at time of writing. This already allows for translating large SQL files, but we have hit these quotas while translating very large Teradata SQL dumps. This can be easily circumvented by splitting the large SQL files into smaller ones and processing them in separate translation jobs. Schema changes like column or table renames between the source Teradata data warehouse and BigQuery will not be translated. The lack of schema mapping support can be solved with a further processing step on the translated output by replacing the affected identifiers in the generated SQL. Also, Teradata functions and procedures are not supported as of now.
In this blogpost, we demonstrated the use of Google BigQuery’s interactive and batch translation services from the Cloud Console UI and programmatically as well. Relying on this easy-to-use toolset can greatly speed up your migration from your on-premises Teradata to Google BigQuery. An automated translation of your existing Teradata SQL without the need for manual translation heavily reduces the invested engineer time, making your migration faster and greatly reducing its price tag. The programmatic (batch) translator allows you to handle a large number of Teradata SQL statements at the same time and provides client libraries for multiple programming languages.