Continuous Delivery & Databases: Mission Possible
Achieving continuous delivery (CD) and including database-related operations tends to appear counter-intuitive. Databases are the powerful center for many applications. They’re carefully managed with slow, infrequent, or manual deployments to minimize risk and potential impact. On the other hand, CD batches small high-frequency changes and automated deployments. So how is it possible to balance these two forces?
Teams adopting or striving for continuous delivery often have problems bringing their database along because they don’t change their approach. Teams must adopt new ways of shipping features and promoting changes through the deployment pipeline in order to include both databases and applications. This post covers solutions to known problems and how specialized tooling keeps CD pipelines moving for applications and databases.
The Database Bottleneck
Problems start with database changes (NoSQL databases included). You’ve encountered these issues before: the problematic uniqueness constraint, the order-dependent database migration and code change, or renaming a column. These scenarios are problematic because they assume the coupling of code and database changes. But the solution is deceivingly simple: decouple the two and work in smaller batches. Let’s walk through some common scenarios.
Uniqueness and other constraints
The issues mentioned above entail a change in application code and database schema. Working in smaller batches means breaking this change into two parts. One part updates the database, and the other updates the application code. The code may be deployed with a feature flag, which detects the uniqueness constraint and activates the required functionality. The database change may then be deployed when the time is right. This approach fits most other RDMS column constraints as well.
Renaming a column
This is a “stop-the-world” scenario. Odds are that application code will not work because it expects a specific database schema. Again, the solution is to roll the change out incrementally.
The first change adds the new database column; the next code leverages the new column and writes new data into the new column; the third change removes the old database column. There are variations to this approach, depending on how much logic needs to be added in the application code, but the key element is that it minimizes risk and keeps the pipeline moving.
This scenario generally applies to NoSQL databases but may also apply to JSON data types in PostgreSQL and even messaging queues. Handling schemaless data requires a different technical mindset than working with an RDMS. The “schema” concept does not exist in the datastore itself, so it must be added in the application layer, allowing for greater flexibility.
Consider an application working with constantly evolving data structures stored in JSON. The application may insert a “version” key (e.g: example values may be “without_field_x,” “feature_x_support,” or simply “3”) into each data structure and can use a parser to read data at version “without_field_x” into the most up-to-date version. Data may also be written back to the database in the updated version in order to migrate each record in real time. The Apache Avro serialization project uses a similar approach to maintain backward and forward compatibility between different versions.
One cannot assume that deployments are atomic or that features are released in single deployments. Instead, features are largely released through a series of small batches, which requires that the application code be more cognizant of the current database state. This approach however does introduce new technical concerns and process changes.
A New Orientation
Achieving continuous delivery requires a buy-in from many different stakeholders. But buying into this new approach may prove difficult since some may consider it too risky or out of line with industry standards. Some even harbor the misconception that CD and its associated practices increase risk. Management then attempts to hedge this risk by adding gatekeepers (like DBAs), change advisory boards, or even external change advisory boards to the process.
Accelerate: The Science of Lean Software and DevOps and the DevOps Handbook refutes these ideas, with the authors finding and demonstrating that these strategies negatively impact performance. They conclude that teams don’t need the newest or trendy architecture to achieve continuous delivery, since their research finds DevOps principles and practices at work in legacy systems, embedded software, databases, and everything in between.
All engineers must undergo a mental shift in order to see their work through a new lense and take advantage of the tools available to seamlessly mesh DB-related ops and CD. Embracing this new orientation begins by embracing automation, testing, removing gatekeepers, and promoting shared responsibility.
Engineers’ and DBAs’ responsibilities must also change as duties will shift between the many technical contributors involved. Your pipeline will have to accommodate this new dynamic, something that should be easily attainable since any and all technical contributors speak code as a lingua franca.
Automate with code
Teams should automate as much manual work as possible for common tasks, such as preparing developer environments, handling promotions across different environments, and environment setup/teardown. This practice is known as Infrastructure as Code. There are a variety of choices to implement it, including Chef, Puppet, or Ansible. All of these tools are similar enough, so choose whichever one feels right and hit the ground running. Also bear in mind that adopting any of the above is a long term commitment, and you’ll run into edge cases and uncertainties regardless of which you choose. It’s helpful to research if anyone has solved your specific problem using the solution in question. You’ll also need to commit to learning as you go and learn to solve automation problems with the tools you have on hand. The more you practice, the better you’ll become.
Infrastructure automation code requires testing and maintenance just like other code. Routinely test the automation to see if it can create a new DB from scratch and apply migrations in sequence. This ensures that subsequent engineers can build new environments and migrations apply correctly. Ambitious teams can remove even more regressions by populating data with production-like data to test migrations and other database changes. Such checks are a perfect vantage point to assess progress made, and more ambitious teams can even automate blue-green zero-downtime deployments.
Accelerate with purpose-built tools
Specially designed database tooling helps manage deployment pipelines as well. These tools are force multipliers in common situations. They can be used to keep different databases in sync as code progresses through the pipeline or to guard against configuration drift if changes are only pushed to production. Tooling also provides visibility into the current situation for all team members so that problems in staging don’t make it through to production. Tooling also illuminates database-specific changes so they’re easier to identify, review, test, and deploy.
There are other benefits beyond visibility. Database tooling integrates into the CD pipeline so databases can be just like other applications. This is especially helpful if the application or framework does not include built-in support for database migrations or historical state tracking.
Continuous delivery with databases and applications is possible with the proper practices in place. Database changes must be treated similarly to application changes and must be kept under SCM and tested during CI. Furthermore, larger and riskier changes must be broken up into smaller independent batches so they can be deployed automatically. Teams may opt for separate application and database deployment pipelines or choose to club them together.
Achieving CD requires a buy-in from all parties involved. Developers must revisit how they view the database, and DBAs need to collaborate more with developers to understand how databases changes can be broken up into smaller and more testable batches. These factors combine to move organizations towards a DevOps value stream and increase engineering productivity as well as business success. All of this also entails getting people to work together more. This may be a significant shift, but it is worth it. After all is said and done, you can see how the seemingly counterintuitive goal of implementing DB-related operations into your CD pipeline is in fact perfectly doable. In other words, you can have your cake and eat it too. Who doesn’t like that?