Evolving data structures in a continuously operated application with a NoSQL database can be challenging. These are some of the experiences we’ve had so far in the development and operation of AODocs, a cloud-native, serverless document management system used by millions of users all around the world.
I’m not saying I miss developing classic Java EE applications with relational databases, but certain things were undoubtedly easier.
In many cloud-native applications with NoSQL databases this approach doesn’t really work.
Because we don’t have maintenance downtimes, there will be periods, even if very short, when multiple application versions will access the same database. We also do progressive rollouts of major versions where two application versions live simultaneously for weeks. But even if we just push a new hotfix version, there will be a few minutes when requests from the old version are still executing, and the new version has also started serving. This is unavoidable without an actual downtime.
These two application versions have to manage the data in a compatible way, both backwards and forwards. If a request from the new version updates the data in the new way, the old version still has to be able to read it. If the old version then overwrites the data with the old schema, the new version still has to be able to read it and operate normally.
Any change to the data structure has to be very carefully designed and coordinated. Even a slight change is usually done considering 3 application versions. Version N will be where we first make the feature available for customers. To ensure compatibility, we usually need to start groundwork on the previous version.
We are currently working with Java 8 in App Engine, with Cloud Datastore. In this setup, people usually use Objectify, the de-facto standard library for ORM mapping with Java.
Let’s assume we have a simple entity:
The boolean field you see above stores whether the given user wishes to receive notifications from our application. (The @Data annotation is from Lombok.)
Let’s assume we’d like to evolve our reminder feature to allow the users to set the reminder frequency to daily. One way to represent this is to change the boolean field to an enum, supporting three values: NONE, ONCE, DAILY.
Note: This is not an actual example. This representation is not meant to be correct, rather it’s intentionally incorrect. This particular example can be modeled in other ways that don’t require a schema change, only a schema extension. But not all model changes can be easily implemented without a schema change, and it’s often better to update the schema than to stick to a worse data model.
So we’ll have a schema update mapping that looks like this:
In version vN-1 we need to be able to read entities that were written according to the new structure. One way to do this is in this example is to use @AlsoLoad provided by Objectify:
If this code in version vN-1 meets an entity of the new structure, it applies a default backwards mapping. Note that Objectify erases fields from the underlying entity if there are no corresponding Java fields declared, so if this version saves the entity, the reminderMode field will be null again.
Then, on version vN, we can rely primarily on the new field, but we still need to be able to read entities from the old schema.
But what happens to the DAILY value?
You probably noticed that our backwards mapping of the enum to boolean is not complete. If a customer sets this value to DAILY on the new version, then if an old version updates the entity, this value will be just lost, and the user will be back to ONCE reminders.
This cannot always be completely avoided; solving this depends on the actual case.
If the risk or the severity of an undesired behavior is high during this rollout period when two application versions operate at the same time, we can use feature flags so that customers can only start leveraging the new feature once the version rollout is complete.
The reminders example could actually be migrated in two versions. We could avoid having to prepare for the migration in vN-1 if the code in vN declared both the boolean and the enum fields. In this case:
This approach is more often applied when there’s a purely technical (not user-facing), or a functionally equivalent data structure change.
The code example for our case:
In this particular case there’s no required code modification for vN-1. This cannot always be achieved. The general rule is that we have to ensure that vN-1 works properly if it reads an entity written by vN. For example, if we add a new possible enum value to an enum typed field, we have to add and handle that enum value somehow in the previous version, otherwise we’ll get an exception when reading the entity.
Let’s assume we have an entity that can be edited by customers. At some point we realize that the display-name of this entity should be unique, so users shouldn’t be able to have two entities with the same name. The rule of 3 versions also applies here.
There’s nothing to do in vN-1 to ensure compatibility. (This may depend on how we interpret the rules, but let’s start here.)
vN will start applying the unicity constraint. The problem here is that vN-1 can still create duplicate names, and we also haven’t migrated data yet, meaning that any previously created duplicate is still there. Depending on where exactly we apply the unicity check, this can be problematic if the unicity check also prevents system actions on that entity.
Another approach is to enforce the name unicity rule only when there’s an actual user that is updating the value. This cannot always be easily identified. In the case of a frontend over a regular REST API call, for example, we cannot be sure if there’s an actual user at the other end who will be able to meaningfully handle our error message, or if we would block some business integration process with our error.
When vN is fully rolled out, we can execute a migration process that automatically assigns a unique name for problematic entities.
In vN+1 we can clean up the ‘eased’ rules and apply the unicity rule completely.
All this sounds cumbersome. And, it really is. And it’s often not just cumbersome, but also complex. While the general pattern applies, every case is a bit different, and requires very careful consideration, design, and execution.
We’ve added multiple checkpoints in various parts of our development process to ensure that all changes that might affect compatibility are noticed and properly managed.