Disaster Recovery Strategies for PostgreSQL Deployments on Kubernetes (Part 1)
For a production environment, it is important to design a disaster recovery strategy. The traditional disaster recovery strategy is to make regular database backups and store them. However, large databases may take hours or several days to restore. Cloud computing has made disaster recovery more efficient and simple. Cloud computing enables the deployment of applications or databases across multiple availability zones or multiple regions to achieve high availability and disaster recovery.
Today, many companies have migrated their applications to a microservices architecture and deploy the application systems including databases on Kubernetes. However, how to implement disaster recovery for databases deployed on Kubernetes? In this post, I'd like to describe the disaster recovery strategies for PostgreSQL deployment on Kubernetes.
Because database is a stateful service, it is hard to manage and scale databases on Kubernetes. PostgreSQL Operators (e.g. Crunchy PostgreSQL Operator, Zalando PostgreSQL Operator) allow you to deploy and manage PostgreSQL clusters on Kubernetes. Both Crunchy PostgreSQL Operator and Zalando PostgreSQL Operator support for multi-cluster deployments used for disaster recovery strategies and high availability. In the next sections, I am going to describe how to use Crunchy PostgreSQL Operator to achieve disaster recovery and high availability.
Multi-Cluster Deployments
Multi-cluster architecture
Below is the multi-cluster architecture.
As the architecture above shows, there are two PostgreSQL clusters deployed in different Kubernetes clusters, and one is deployed as active PostgreSQL cluster and another is deployed as standby PostgreSQL cluster. The active PostgreSQL cluster pushes WAL to an external storage (S3 or GCS). One PostgreSQL instance in the standby PostgreSQL cluster continuously reads WAL from the external storage and and the replicas in the standby PostgreSQL cluster are cascading replicas. The standby PostgreSQL cluster is read-only.
If the active PostgreSQL cluster goes down, you can promote the standby PostgreSQL cluster manually. And the old active PostgreSQL cluster can be recovered as a standby PostgreSQL cluster.
Next, let's verify the multiple clusters architecture on Google Kubernetes Engine (GKE).
Creating prerequisite resources
Assuming you have installed and setup Google Cloud SDK.
Configure IAM
Create an IAM service account for PostgreSQL cluster and add the storage.admin
role
to it so that it can access Google Cloud Storage.
export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create postgres-gcs
gcloud iam service-accounts keys create --iam-account \
postgres-gcs@${PROJECT_ID}.iam.gserviceaccount.com gcs-key.json
gcloud projects add-iam-policy-binding \
--role roles/storage.admin ${PROJECT_ID} \
--member=serviceAccount:postgres-gcs@${PROJECT_ID}.iam.gserviceaccount.com
Then the service account key file will be generated in the current directory.
Create GKE clusters
Create two GKE clusters in different regions.
gcloud container clusters create cluster1 \--release-channel regular --zone us-central1-a
gcloud container clusters create cluster2 \
--release-channel regular --zone us-east1-b
Create a cloud storage bucket
Create a cloud storage bucket to store the backups and WAL.
gsutil mb gs://crunchy-postgres-backups
Install PostgreSQL Operator
Install PostgreSQL Operator in both GKE clusters. Here we use the default storage class.
curl -LO https://raw.githubusercontent.com/CrunchyData/postgres-operator/v4.7.0/installers/kubectl/postgres-operator.yml
kubectl create namespace pgo
kubectl apply -f postgres-operator.yml
Once PostgreSQL Operator is deployed successfully, the pod will be in "Running" status.
kubectl get pod -n pgo
NAME READY STATUS RESTARTS AGE
pgo-deploy-qlcnb 0/1 Completed 0 3m17s
postgres-operator-85cbf887f-lnzlk 4/4 Running 1 2m27s
Install pgo client
Install pgo client command.
curl -LO https://raw.githubusercontent.com/CrunchyData/postgres-operator/v4.7.0/installers/kubectl/client-setup.sh
chmod +x client-setup.sh
./client-setup.sh
cat <<EOF >> ~/.bashrc
export PATH=${HOME}/.pgo/pgo:$PATH
export PGOUSER=${HOME}/.pgo/pgo/pgouser
export PGO_CA_CERT=${HOME}/.pgo/pgo/client.crt
export PGO_CLIENT_CERT=${HOME}/.pgo/pgo/client.crt
export PGO_CLIENT_KEY=${HOME}/.pgo/pgo/client.key
export PGO_APISERVER_URL='https://127.0.0.1:8443'
export PGO_NAMESPACE=pgo
EOF
source ~/.bashrc
Run "pgo version" to verify the installation.
kubectl -n pgo port-forward svc/postgres-operator 8443:8443 &
If the installation is successful, you will see the version 4.7.0 in the result.
pgo version
pgo client version 4.7.0
pgo-apiserver version 4.7.0
Deploy active PostgreSQL cluster
First, let's deploy the active PostgreSQL cluster in GKE cluster1.
[GKE cluster1]# pgo create cluster hippo \
--replica-count=2 \
--pgbackrest-storage-type=gcs \
--pgbackrest-gcs-bucket=crunchy-postgres-backups \
--pgbackrest-gcs-key=gcs-key.json \
--password-superuser=postgres \
--password-replication=postgres \
--password=postgres
The command line options:
- --replica-count: deploy a PostgreSQL cluster with two replicas
- --pgbackrest-storage-type: push the backups and WAL to GCS
- --pgbackrest-gcs-bucket: the GCS bucket name used for storing backups and WAL
- --pgbackrest-gcs-key: the file name that contains the service account key
- --password-superuser, --password-replication, --password: the password for PostgreSQL superuser, replication user and the standard PostgreSQL user. Here we specify all passwords as "postgres".
Verify the cluster deployment. If successful, you can see one primary and two replicas in "UP" status.
[GKE cluster1]# pgo test hippo
cluster : hippo
Services
primary (10.3.248.48:5432): UP
replica (10.3.246.216:5432): UP
Instances
primary (hippo-85474cd644-hpprz): UP
replica (hippo-kvko-77bbdd4756-2hjr7): UP
replica (hippo-xvgh-7bb554d876-6pxr8): UP
Connect to primary PostgreSQL and run several write queries.
[GKE cluster1]# kubectl port-forward svc/hippo 5432:5432 -n pgo &
[GKE cluster1]# psql -h 127.0.0.1 -U postgres
postgres=# create table t1 (id int);
postgres=# insert into t1 select generate_series(1,10);
Deploy standby PostgreSQL cluster
Next, let's deploy the standby PostgreSQL cluster with two cascading replicas in GKE cluster2.
Please notice that you need to include the
[GKE cluster2]# pgo create cluster hippo-standby \
--standby \
--replica-count=2 \
--pgbackrest-repo-path=/backrestrepo/hippo-backrest-shared-repo \
--pgbackrest-storage-type=gcs \
--pgbackrest-gcs-bucket=crunchy-postgres-backups \
--pgbackrest-gcs-key=gcs-key.json \
--password-superuser=postgres \
--password-replication=postgres \
--password=postgres
Verify the standby cluster deployment.
[GKE cluster2]# pgo test hippo-standby
cluster : hippo-standby
Services
primary (10.7.248.242:5432): UP
replica (10.7.250.51:5432): UP
Instances
primary (hippo-standby-7d949bb7f8-d7xwn): UP
replica (hippo-standby-dsiw-5455d859d9-sr8c7): UP
replica (hippo-standby-lpqp-599c56f5cc-dgbnd): UP
Once all the instances are in UP status, let's connect to the standby PostgreSQL and verify data replication.
[GKE cluster2]# kubectl port-forward svc/hippo-standby 5432:5432 -n pgo &
[GKE cluster2]# psql -h 127.0.0.1 -U postgres
postgres=# select * from t1;
id
----
1
2
3
4
5
6
7
8
9
10
As you can see in the result above, replication has been configured correctly.
Promote standby PostgreSQL cluster
First, let's shutdown the active PostgreSQL cluster.
[GKE cluster1]# pgo update cluster hippo --shutdown
You need to ensure the active PostgreSQL cluster is shutdown completely to prevent a
[GKE cluster1]# pgo test hippocluster : hippo
Services
Instances
primary (): DOWN
[GKE cluster2]# psql -h 127.0.0.1 -U postgres
postgres=# insert into t1 select generate_series(11,20);
INSERT 0 10
Comments
Post a Comment