Running PostgreSQL on Kubernetes: An Overview
Containerization is going beyond the software development market into the database administration world. Database operators are exploring the application of container orchestration for database management. PostgreSQL is a leading open-source relational database management system. Kubernetes is a popular container orchestration tool. When combined, database administrators can leverage the automation, scalability and versatility benefits of Kubernetes to create a highly available SQL database system. Here, we’ll look at how to Kubernetes and PostgreSQL can work together and cover some tips to get you started.
What Is PostgreSQL?
Postgres is an object-oriented relational database management system (RDBMS) that uses the SQL language to perform queries. PostgreSQL is open-source and free. Some of the main features of Postgres include:
- Data types customization—enables users to define individual data types using custom functions.
- Complex queries—capability to perform complex read-write operations with data.
- Multi-version concurrency control—enables multiple users to access the same database.
- ACID-compliance—a set of properties database transactions should have to ensure the validity of data in the event of failures or outages: Atomicity, Consistency, Isolation, and Durability.
- SQL standard compliance—complies with at least 150 of the 164 mandatory features required for full core compliance of the ISO/IEC 9075.
- Large community of contributors—large and involved community of contributors.
- Extensibility—users can create the extensions they need. A typical example includes packaging related objects into an extension, such as new functions, new operators and index operator classes.
What Is Kubernetes?
Kubernetes is a container orchestration system. It is an open-source platform created by Google. Kubernetes streamlines the application development process by automating the management, scaling and deployment of containerized applications. Some of Kubernetes’ key features include:
- Storage orchestration—supports all major cloud providers, on-premises databases, and hybrid environments.
- Load balancing—can distribute network traffic to prevent overloads.
- Automatic Self-healing—can remove unhealthy containers, replacing them with new healthy ones.
- Automated resource allocation—can determine how much resources you want to allocate to each container, for example how much memory. Kubernetes then distributes the resources according to these requirements.
Why Run a Database on Kubernetes?
The main benefit of running a Postgres database on Kubernetes is improved collaboration and performance. Here we explain it in detail:
- Improved performance—Kubernetes enables the development of scalable database services because it is based on a microservices architecture. It schedules Pods according to the resources available, automating application deployment. Postgres’ Write-Ahead Logs (WAL) store all data changes in a transaction log, sending them to disk before the changes get written in the database. This promotes easier disaster recovery.
- Improved collaboration—Kubernetes Pods collaborate to address client requests. When client traffic enters, the service routes it to a Pod for load balancing. The Pods can be added or removed without interrupting service, which simplifies adapting or updating the service on demand. Users can benefit from Helm charts to share resource definitions with other users.
- Stateful workloads support—stateful services preserve their state from one session to the next. These services require security, reliability, and performance. A container orchestration platform provides automation for this kind of large scale operation.
Postgres instances and Kubernetes Pods can be adapted to work together, but database administrators may find it challenging. There are several managed solutions that help database operators run Postgres in Kubernetes. These solutions help companies maximize the benefits of a container orchestrated environment.
Considerations Before Running a Postgres Workload in Kubernetes
Because of their stateful characteristics, Postgres workloads benefit from Kubernetes automation. However, stateful workloads must meet the following requirements to achieve the security, availability, and performance needed for critical applications:
- Container-native storage—you should have a data layer that provides dynamic storage provisioning to comply with stateful services like Postgres. Solutions such as container volumes are not designed to feed storage directly to containers. Using blocks can cause slow database provisioning. Container-native storage keeps the data available if you need to reschedule the Pods.
- Availability—Postgres uses WAL to ensure data integrity. The database logs the changes in data, sending them to disk before they get written to the database. In the event of database corruption or disaster, Postgres can retrieve the WAL to re-apply the changes. Since these logs are stored in a persistent location, the container management system should support storing data locally. It must also provide high read-write performance, to avoid slowing down the WAL process.
- Data security—Postgres has built-in encryption that protects your data by encoding it using a cipher. To return the data to its original state, you will need a decryption code or password. You need to activate security measures, like encryption and role-based access controls, at the application level.
Tips to Initiate a Postgres Database in Kubernetes
In order to initiate a PostgreSQL database in Kubernetes, we need first to understand how Postgres works when running inside a Kubernetes environment. The PostgreSQL executable is the center of every PostgreSQL instance.
Postgres.exe is a process that originates when you install PostgreSQL in your computer or system and is used as a core process to run the database. In Kubernetes, we run this executable as a Pod with a PostgreSQL container.
Some additional configuration tips to keep in mind include:
- Be careful with Pods specifications—there are two types of Postgres instances: primary and standby. Usually, there is only one primary instance, used for reads and writes. The standby instances are used only for reads and can be promoted to primary if a primary fails. When running in Kubernetes, both instances need to have the same Pod specifications. This will allow standby Pods to take primary roles if a failover occurs.
- Leverage Kubernetes architecture—Kubernetes uses distributed agents (controllers) that communicate through a central store. These controllers track the status of Postgres instances, managing the entire Postgres application. In addition, you can use a Pod sidecar for instance-level management such as promoting the Pod to primary or demoting to backup. However, if the Pod dies, the sidecar cannot update the controller. To prevent this you can run regular health checks for Pods. Kubernetes Pod health checker will update the controller about failing instances.
- Create backups—It is a good practice to keep a backup copy of your data. This way, if the platform needs to reschedule a Pod, it can use a local copy to recover the database. This ensures no data is lost and shortens the time to restart a Pod.
Running PostgreSQL on Kubernetes is an opportunity to utilize the best of both tools. You can combine the robust data processing capabilities of PostgreSQL with the scalability, flexibility, and self-healing of Kubernetes. This will allow you to achieve greater reliability, data integrity, and higher availability necessary for successful database management.
Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.