9 DevOps KPIs to Optimize Your Database
DevOps people like things to be automated, testable, and disaster-proof, and stateless services can achieve those features quite easily.
After all, you only need to recreate the instances from the images, and you're good to go. In a best-case scenario, this can take as little time as a few seconds.
Databases, on the other hand, belong to a world that can be either shunned or dreaded. They are stateful and therefore require extra care in terms of disaster recovery, scaling, updates, and migration. Since databases present such a challenge, it’s no coincidence that most major cloud providers offer them as managed services.
There are even external DBaaS (Database-as-a-Service) providers, showing that having a responsible third-party governing your database can reduce development and maintenance costs.
DevOps-ifying the Database
Whether you choose to administer your database on-premise or use a managed service, you need to take care of some Key Performance Indicators if you want to truly embrace DevOps culture.
These KPIs don't relate to the technical understanding of "performance" — measured, for example, in Operations per Second or some other such metric. They take a more holistic approach, measuring your performance as the manager of a whole system consisting of the product, the process, and the team behind it.
Below is a list of nine DevOps KPIs we find to be the most useful when working with agile databases. This list is by no means comprehensive and some of the KPIs may not even apply to the system you're developing. Still, they should interest you and make you think about your own performance metrics as well as how to boost your DevOps ways.
A DB is only useful when you can either query it or write to it. That's obvious. But it also means availability, measured either as complete availability (read/write) or partial availability (read-only), is one of the most important metrics for a database. Aiming for 100% availability seems like a great idea at first, but if you take into account necessary maintenance, such as security upgrades, a smaller figure becomes much more realistic.
2. Recovery Time
Are you familiar with the concept of antifragility? This is a feature that corrects a system when its parts are broken. Your DB should be antifragile as well. Recovery from backup should be tested on a regular, scheduled basis and in an automated manner. Problems will arise, so don’t sit around pondering, "What if?” Prepare for the inevitable.
That said, your automated recovery tests should take into account how long it takes to fully recover from something bad happening. Will the recovery fit into the maintenance window? Better yet, will it fit there twice? If not, what can you do to make it fit? This metric can impact the previous one, so good planning and architecture before the first deployment can save you a lot of precious time later on when disaster strikes.
3. Query Response Time
If your database is available and ready for disaster, there is an additional DevOps KPI that decides whether it's actually delivering business value to your customers. That metric is query response time: how long it takes in the worst-case scenario to either fetch the necessary data or write it down.
Even if your database replies 24/7, if the simplest query takes a lot of time, you need to investigate. Just as with poor availability, poor response time can lead to losses. Customers unable to navigate an e-commerce store will find a different supplier. Workers trying to input the same measurement again and again will be too busy to do other meaningful work. And automated scripts may even stop retrying and drop the data altogether.
4. CPU/RAM Performance
When you provision your systems you need to make some trade-offs. Smaller CPU and RAM provisioning means lower costs but can also cause poor performance leading to longer query response times. Bigger CPU and RAM usually make it easier to handle spikes in traffic, but the associated costs may not balance this plus. This is where CPU/RAM performance monitoring helps.
First of all, this DevOps KPI gives you insight into how many resources your database actually utilizes. With this you can see if the DB is under-provisioned or over-provisioned and can adjust the scale if necessary. But more importantly, long-term monitoring makes it easier to note any spikes and other usage trends.
For example, most e-commerce sites are aware that December is generally a month of high traffic, while other months are much slower. Knowing when you need to scale to meet changing demand can make managing costs far easier.
5. I/O & Network Throughput
Connected with the previous metric is I/O and network throughput. Even though the ways of monitoring these values are quite similar to those for CPU and RAM, there are really a few different tactics here. Whereas it's quite easy these days to scale CPU and RAM provisioning up and down, it's not that easy to manipulate disk or network I/O.
An instance can be connected to one network link at a time, and changing it during runtime is far from convenient. Of course, if you're utilizing 10 Gigabit Ethernet, you can shape the traffic to split the throughput between several instances. But if you want to scale higher, you generally need to replace a bunch of hardware, such as network adapters, switches, and sometimes even cabling.
The same goes with disk storage. The hardware throughput does not come in small increments like with RAM but rather divides into distinct classes (think HDD, SATA SSD, and PCI-Express).
6. Growth Over Time
Generally speaking, most databases grow over time. After all, that's what you use them for: to store data for the future. This means you need to know in advance when to scale up disk capacity as the data accumulates.
Monitoring growth over time as a DevOps KPI can provide this knowledge, showing how much data you are gathering and when you will outgrow the current capacity. It can also help in predicting backup costs. This metric may even lead to a switch of architecture as you may decide to store some of the data in a less-accessible but more cost-effective way.
7. Lead Time
This metric is a bit different than the previous ones. So far, we have been mainly discussing the various characteristics of a database itself regardless of environment. But DevOps KPIs should account for more than just pure DBA values.
Lead time is a metric that shows how long it takes for a change to take place. You start the timer when the code is pushed to the repository and stop it when said code is running in production. In a proper DevOps pipeline, this would mean that the code passes the necessary tests (the code review), is then deployed to the staging environment where it passes acceptance tests, and finally is deployed to production.
8. Change Volume
This indicates exactly what is being shipped each time the production is updated. How many new features are implemented? How many user stories are resolved? How many bugs are fixed?
If you track all these data, you can go even further and predict delivery based on the mean time to implement a user story. However you use it, change volume helps manage the project by showing the exact dynamics between a backlog and the actual state of production.
9. Deployment Frequency
This DevOps KPI is pretty self-explanatory. In general, deployment frequency shows how often code gets deployed to production. But in more detail, it should also note how many deployments fail or in what stage bugs are found—during the code review, unit tests, or as far as acceptance testing on the staging environment?
Failing deployments are nothing bad per se. You just need to catch potential failures as early as possible since fixing the code is usually much easier before it enters the release branch.
What Are Your Favorite DevOps KPIs?
We found these nine KPIs to be the most interesting when dealing with databases in DevOps environments. But, as we already mentioned, the list is far from complete. There are other metrics you can measure as well, and they may even make much more sense for your company than those we proposed.
If you have any interesting DevOps KPIs you would like to share with us, please post them in the comments section along with their use case. We don't believe there is a one-size-fits-all solution for this challenge, but we do believe exchanging ideas leads to improvement.