AlloyDB is a fully managed PostgreSQL-compatible database service, supporting atomicity, consistency, isolation, and durability (ACID)-compliant transactions. Some of its claims are:
This experiment will focus on an analytical query performance comparison between AlloyDB and BigQuery. There is an official Google Cloud blog on benchmarking for AlloyDB, but it compares to standard PostgreSQL. The benchmark setup will follow this guideline with additional parts to run the queries on BigQuery as well.
Let’s take a look at AlloyDB components.
We create a cluster that organizes all resources, such as databases, logs, and other metadata.
A cluster contains several nodes, which are virtual machine instances that are dedicated to running the query. AlloyDB organizes nodes into instances. There are two kinds of instances:
Two Google Cloud blog articles explain transactional and analytical workload aspects of AlloyDB for PostgreSQL under the hood:
1. Intelligent, database-aware storage
Some of the key highlights from this article are:
Log processing service (LPS) is an AlloyDB internal service that processes WAL records and produces database blocks. The compute layer only communicates the WAL records to the storage layer. No block writes are needed.
(Slide from the AlloyDB Partner Office Hours earlier this year)
There are multiple performance optimizations on other parts, such as replication and recovery, which makes the article worth checking. The two points above are highlights closely related to the transactional aspect.
A key highlight from the article is that AlloyDB uses Machine Learning to intelligently select the best tables/columns to keep in columnar format and later automatically maintain this data in memory. Note that on doing analytical benchmarking, we need to execute the queries first as a “warm-up” to let AlloyDB check the workloads. Only after that, we measure the performance.
(Slide from the AlloyDB Partner Office Hours earlier this year)
The columnar engine not only allows aggregation operations directly on the relevant columns without materializing the results of the scan, but it includes several algorithmic optimizations by utilizing column-specific metadata such as min-max values to speed up the scan.
This slide below is a recommendation from the Partner Office Hours on this topic, which we will try to check on the benchmark.
We use the TPC-H dataset to do the benchmarking. A TPC-H Benchmark is a transaction processing and database benchmark specific to decision support – i.e., analytics, run and managed by the Transaction Processing Performance Council.
On this TPC-H setup, we will have eight tables whose size can be set and queries that consist of joins and aggregations to represent analytical queries.
You can check TPC-H specification in this official document, specifically on section 2.4 to see what queries being used.
TPCH_SCALE will determine the size of the data across the tables. For example:
- TPCH_SCALE=10 will result in 86,586,082 total row count, ~20GB.
- TPCH_SCALE=30 will result in 259,798,402 total row count, ~60GB.
This experiment will use AlloyDB with this specification:
- database version: PostgreSQL 14
- region: us-central1
- machine type: 16vCPU / 128GB
Both TPCH scale 10 and 30 will fit in the columnar engine memory. The database flags that will be set are:
The other parts, such as inserting the data and the client machine being used will be covered in the code repository.
Below are 22 analytical queries that were executed on AlloyDB and BigQuery. The query runtimes and the geometric mean are in seconds.
Based on this specific experiment, AlloyDB can perform analytical queries comparable to BigQuery within a certain threshold, in this case, ~60GB of data. In this experiment, all queries were completed in under 1 minute, which is arguably tolerable considering the primary AlloyDB usage should be on transactional workloads. The advertised hybrid transactional and analytical processing (HTAP) of AlloyDB seems promising.
But more than 60GB would be questionable since we need to add more resources to the AlloyDB cluster. The performance would be inferior to BigQuery, where all these queries were completed in under 10 seconds without using table partitioning or clustering.