Writing Your Own Kubernetes Operator - A Practical Guide
Posted on Sun 22 March 2026 by Sanyam Khurana in DevOps
Kubernetes is great at managing stateless workloads out of the box. You define a Deployment, it creates Pods, handles rolling updates, and restarts them if they crash. But what about stateful applications like databases? What if you want Kubernetes to understand how to create a PostgreSQL replica, take a backup, or handle a failover? That's where Operators come in.
An Operator is essentially you teaching Kubernetes how to manage a specific application. Instead of writing runbooks for your team to follow, you encode that operational knowledge into code. Kubernetes then does the work for you, 24/7, without getting tired or making typos.
In this post, we'll build a simple PostgreSQL Operator from scratch using Go and Kubebuilder. By the end, you'll have a working Operator that can create and manage PostgreSQL instances on your cluster.
The Operator Pattern
Before we write code, let's understand the pattern. Every Operator has two parts:
1. Custom Resource Definition (CRD): This extends the Kubernetes API with your own resource type. Just like Kubernetes has Deployment, Service, and Pod, your CRD creates something like PostgresDB. Users can then do kubectl apply with a YAML file that defines a PostgresDB resource.
2. Controller: This is the brains of the operation. It watches for changes to your custom resources and takes action to make the actual state match the desired state. This is called the reconciliation loop - the same pattern that makes Deployments work. "User wants 3 replicas, but there are only 2? Create one more."
Prerequisites
Before we start, make sure you have:
- Go 1.21+ installed
- Docker installed
kubectlconfigured and connected to a cluster (minikube works fine)- Kubebuilder installed:
# Install Kubebuilder
curl -L -o kubebuilder "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)"
chmod +x kubebuilder && sudo mv kubebuilder /usr/local/bin/
Step 1: Scaffold the Project
Kubebuilder generates all the boilerplate for us:
mkdir postgres-operator && cd postgres-operator
kubebuilder init --domain thegeekyway.com --repo github.com/thegeekyway/postgres-operator
Now create the API (the CRD and Controller):
kubebuilder create api --group database --version v1alpha1 --kind PostgresDB
When prompted, say y to both "Create Resource" and "Create Controller".
This gives us a project structure like:
postgres-operator/
api/v1alpha1/
postgresdb_types.go # CRD definition
internal/controller/
postgresdb_controller.go # Reconciliation logic
config/
crd/ # Generated CRD manifests
rbac/ # RBAC permissions
cmd/
main.go # Entry point
Step 2: Define the Custom Resource
Open api/v1alpha1/postgresdb_types.go. This is where we define what a PostgresDB resource looks like. The scaffolded code has empty spec and status structs. Let's fill them in:
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PostgresDBSpec defines the desired state of PostgresDB
type PostgresDBSpec struct {
// Version of PostgreSQL to deploy
Version string `json:"version"`
// Storage size for the database (e.g., "1Gi", "10Gi")
Storage string `json:"storage"`
// Number of replicas (1 = primary only, 2+ = primary + replicas)
Replicas int32 `json:"replicas"`
// Database name to create on initialization
DatabaseName string `json:"databaseName"`
}
// PostgresDBStatus defines the observed state of PostgresDB
type PostgresDBStatus struct {
// Ready indicates whether the database is accepting connections
Ready bool `json:"ready"`
// Phase represents the current lifecycle phase
// Can be: Pending, Creating, Running, Failed
Phase string `json:"phase"`
// ConnectionString for applications to connect
ConnectionString string `json:"connectionString,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Version",type=string,JSONPath=`.spec.version`
//+kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
//+kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
//+kubebuilder:printcolumn:name="Ready",type=boolean,JSONPath=`.status.ready`
// PostgresDB is the Schema for the postgresdbs API
type PostgresDB struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec PostgresDBSpec `json:"spec,omitempty"`
Status PostgresDBStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// PostgresDBList contains a list of PostgresDB
type PostgresDBList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []PostgresDB `json:"items"`
}
func init() {
SchemeBuilder.Register(&PostgresDB{}, &PostgresDBList{})
}
The +kubebuilder comments are markers that Kubebuilder uses to generate the CRD YAML. The printcolumn markers define what shows up when you run kubectl get postgresdbs.
Now regenerate the manifests:
make manifests
make generate
Step 3: Write the Controller
This is the core of our Operator. Open internal/controller/postgresdb_controller.go and let's write the reconciliation logic:
package controller
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
databasev1alpha1 "github.com/thegeekyway/postgres-operator/api/v1alpha1"
)
type PostgresDBReconciler struct {
client.Client
Scheme *runtime.Scheme
}
//+kubebuilder:rbac:groups=database.thegeekyway.com,resources=postgresdbs,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=database.thegeekyway.com,resources=postgresdbs/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups="",resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete
func (r *PostgresDBReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// Fetch the PostgresDB resource
pgdb := &databasev1alpha1.PostgresDB{}
if err := r.Get(ctx, req.NamespacedName, pgdb); err != nil {
if errors.IsNotFound(err) {
// Resource was deleted, nothing to do
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
logger.Info("Reconciling PostgresDB", "name", pgdb.Name)
// Update status to Creating if it's new
if pgdb.Status.Phase == "" {
pgdb.Status.Phase = "Creating"
if err := r.Status().Update(ctx, pgdb); err != nil {
return ctrl.Result{}, err
}
}
// Ensure the Service exists
if err := r.ensureService(ctx, pgdb); err != nil {
return ctrl.Result{}, err
}
// Ensure the StatefulSet exists
if err := r.ensureStatefulSet(ctx, pgdb); err != nil {
return ctrl.Result{}, err
}
// Update status
pgdb.Status.Phase = "Running"
pgdb.Status.Ready = true
pgdb.Status.ConnectionString = fmt.Sprintf(
"postgresql://%s.%s.svc.cluster.local:5432/%s",
pgdb.Name, pgdb.Namespace, pgdb.Spec.DatabaseName,
)
if err := r.Status().Update(ctx, pgdb); err != nil {
return ctrl.Result{}, err
}
logger.Info("PostgresDB reconciled successfully", "name", pgdb.Name)
return ctrl.Result{}, nil
}
func (r *PostgresDBReconciler) ensureService(ctx context.Context, pgdb *databasev1alpha1.PostgresDB) error {
svc := &corev1.Service{}
err := r.Get(ctx, types.NamespacedName{Name: pgdb.Name, Namespace: pgdb.Namespace}, svc)
if err == nil {
return nil // Service already exists
}
if !errors.IsNotFound(err) {
return err
}
// Create the Service
svc = &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: pgdb.Name,
Namespace: pgdb.Namespace,
},
Spec: corev1.ServiceSpec{
Selector: map[string]string{
"app": "postgres",
"postgresdb": pgdb.Name,
},
Ports: []corev1.ServicePort{
{
Port: 5432,
Name: "postgres",
Protocol: corev1.ProtocolTCP,
},
},
ClusterIP: "None", // Headless service for StatefulSet
},
}
// Set the owner reference so the Service is garbage collected
// when the PostgresDB is deleted
ctrl.SetControllerReference(pgdb, svc, r.Scheme)
return r.Create(ctx, svc)
}
func (r *PostgresDBReconciler) ensureStatefulSet(ctx context.Context, pgdb *databasev1alpha1.PostgresDB) error {
sts := &appsv1.StatefulSet{}
err := r.Get(ctx, types.NamespacedName{Name: pgdb.Name, Namespace: pgdb.Namespace}, sts)
if err == nil {
// StatefulSet exists, check if replicas need updating
if *sts.Spec.Replicas != pgdb.Spec.Replicas {
sts.Spec.Replicas = &pgdb.Spec.Replicas
return r.Update(ctx, sts)
}
return nil
}
if !errors.IsNotFound(err) {
return err
}
// Create the StatefulSet
labels := map[string]string{
"app": "postgres",
"postgresdb": pgdb.Name,
}
storageQuantity := resource.MustParse(pgdb.Spec.Storage)
sts = &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: pgdb.Name,
Namespace: pgdb.Namespace,
},
Spec: appsv1.StatefulSetSpec{
ServiceName: pgdb.Name,
Replicas: &pgdb.Spec.Replicas,
Selector: &metav1.LabelSelector{
MatchLabels: labels,
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: labels,
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "postgres",
Image: fmt.Sprintf("postgres:%s", pgdb.Spec.Version),
Ports: []corev1.ContainerPort{
{
ContainerPort: 5432,
Name: "postgres",
},
},
Env: []corev1.EnvVar{
{
Name: "POSTGRES_DB",
Value: pgdb.Spec.DatabaseName,
},
{
Name: "POSTGRES_USER",
Value: "postgres",
},
{
Name: "POSTGRES_PASSWORD",
Value: "changeme", // In production, use a Secret
},
{
Name: "PGDATA",
Value: "/var/lib/postgresql/data/pgdata",
},
},
VolumeMounts: []corev1.VolumeMount{
{
Name: "data",
MountPath: "/var/lib/postgresql/data",
},
},
},
},
},
},
VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
{
ObjectMeta: metav1.ObjectMeta{
Name: "data",
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{
corev1.ReadWriteOnce,
},
Resources: corev1.VolumeResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceStorage: storageQuantity,
},
},
},
},
},
},
}
ctrl.SetControllerReference(pgdb, sts, r.Scheme)
return r.Create(ctx, sts)
}
func (r *PostgresDBReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&databasev1alpha1.PostgresDB{}).
Owns(&appsv1.StatefulSet{}).
Owns(&corev1.Service{}).
Complete(r)
}
Let me walk through what's happening here:
- Reconcile is the heart of the controller. Kubernetes calls this function whenever a
PostgresDBresource is created, updated, or deleted. Our job is to make reality match the desired state. - ensureService creates a headless Service for the StatefulSet. Headless services give each Pod a stable DNS name, which is important for databases.
- ensureStatefulSet creates a StatefulSet that runs PostgreSQL. StatefulSets are the right choice for databases because they provide stable network identities and persistent storage.
- SetupWithManager tells the controller what resources to watch. We watch
PostgresDBresources directly, and also watchStatefulSetsandServicesthat we own, so we get notified if someone modifies or deletes them.
The ctrl.SetControllerReference calls are important. They set up owner references so that when a PostgresDB is deleted, Kubernetes automatically garbage collects the Service and StatefulSet.
Step 4: Create a Sample Resource
Create a file config/samples/database_v1alpha1_postgresdb.yaml:
apiVersion: database.thegeekyway.com/v1alpha1
kind: PostgresDB
metadata:
name: my-postgres
namespace: default
spec:
version: "16"
storage: "1Gi"
replicas: 1
databaseName: "myapp"
That's all a user needs to write. No StatefulSet YAML, no Service YAML, no PVC YAML. Just "I want a PostgreSQL 16 instance with 1Gi storage."
Step 5: Run and Test
Let's test our Operator locally against a cluster:
# Install the CRDs into the cluster
make install
# Run the controller locally (for development)
make run
In another terminal, create a PostgresDB:
kubectl apply -f config/samples/database_v1alpha1_postgresdb.yaml
Check what was created:
$ kubectl get postgresdbs
NAME VERSION REPLICAS PHASE READY
my-postgres 16 1 Running true
$ kubectl get statefulsets
NAME READY AGE
my-postgres 1/1 30s
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-postgres ClusterIP None <none> 5432/TCP 30s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES AGE
data-my-postgres-0 Bound pv-xxx 1Gi RWO 30s
From a single 8-line YAML, our Operator created a StatefulSet, a headless Service, and a PersistentVolumeClaim. That's the power of the Operator pattern.
Step 6: Build and Deploy
When you're ready to run the Operator inside the cluster (instead of locally):
# Build the Docker image
make docker-build IMG=your-registry/postgres-operator:v0.1.0
# Push to your registry
make docker-push IMG=your-registry/postgres-operator:v0.1.0
# Deploy to the cluster
make deploy IMG=your-registry/postgres-operator:v0.1.0
The Operator now runs as a Deployment in the postgres-operator-system namespace.
Taking It Further
Our Operator is functional but basic. A production-ready PostgreSQL Operator would include:
Password Management with Secrets
Instead of hardcoding the password, generate a random one and store it in a Kubernetes Secret:
// In your reconciler, before creating the StatefulSet:
secret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-credentials", pgdb.Name),
Namespace: pgdb.Namespace,
},
StringData: map[string]string{
"POSTGRES_PASSWORD": generateRandomPassword(24),
},
}
Then reference the Secret in the StatefulSet's env vars using SecretKeyRef.
Health Checks
Add liveness and readiness probes to the container:
LivenessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
Exec: &corev1.ExecAction{
Command: []string{"pg_isready", "-U", "postgres"},
},
},
InitialDelaySeconds: 30,
PeriodSeconds: 10,
},
Backup CRD
You could create another CRD called PostgresBackup that triggers pg_dump on a schedule:
apiVersion: database.thegeekyway.com/v1alpha1
kind: PostgresBackup
metadata:
name: nightly-backup
spec:
postgresRef: my-postgres
schedule: "0 2 * * *"
destination: "s3://my-backups/postgres/"
Status Conditions
Instead of a simple Ready boolean, use proper Kubernetes status conditions to report detailed state:
type PostgresDBStatus struct {
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
This follows Kubernetes conventions and makes your CRD work nicely with tools like kubectl wait.
Operator Frameworks Comparison
Kubebuilder isn't the only option. Here's a quick look at the alternatives:
| Framework | Language | Complexity | Best For |
|---|---|---|---|
| Kubebuilder | Go | Medium | Production operators with full Kubernetes API access |
| Operator SDK | Go, Ansible, Helm | Medium | Broader tooling, includes Kubebuilder under the hood |
| Kopf | Python | Low | Quick prototyping and simpler operators |
| Metacontroller | Any (webhooks) | Low | When you don't want to deal with client libraries |
If Go isn't your thing, Kopf is a fantastic choice. You can write a basic operator in under 50 lines of Python. But for production workloads where performance and reliability matter, Go with Kubebuilder is the standard.
Summary
Building a Kubernetes Operator boils down to three concepts:
- Define a CRD that describes what users want (the "what")
- Write a controller that makes it happen (the "how")
- Use the reconciliation loop to continuously ensure desired state matches actual state
The PostgreSQL Operator we built is simple, but it demonstrates the core pattern. Every operator - whether it's managing databases, message queues, or ML training jobs - follows this same structure. The reconciliation loop is always: "What does the user want? What exists right now? What do I need to do to bridge the gap?"
The beauty of Operators is that once you encode your operational knowledge into one, it works tirelessly. No more 3 AM pager alerts for things that can be automated. And that, to me, is what good engineering is about.
If you've any questions about building Kubernetes Operators, please let us know in the comments section below.