Writing Your Own Kubernetes Operator - A Practical Guide

Posted on Sun 22 March 2026 by Sanyam Khurana in DevOps

Kubernetes is great at managing stateless workloads out of the box. You define a Deployment, it creates Pods, handles rolling updates, and restarts them if they crash. But what about stateful applications like databases? What if you want Kubernetes to understand how to create a PostgreSQL replica, take a backup, or handle a failover? That's where Operators come in.

An Operator is essentially you teaching Kubernetes how to manage a specific application. Instead of writing runbooks for your team to follow, you encode that operational knowledge into code. Kubernetes then does the work for you, 24/7, without getting tired or making typos.

In this post, we'll build a simple PostgreSQL Operator from scratch using Go and Kubebuilder. By the end, you'll have a working Operator that can create and manage PostgreSQL instances on your cluster.

The Operator Pattern

Before we write code, let's understand the pattern. Every Operator has two parts:

1. Custom Resource Definition (CRD): This extends the Kubernetes API with your own resource type. Just like Kubernetes has Deployment, Service, and Pod, your CRD creates something like PostgresDB. Users can then do kubectl apply with a YAML file that defines a PostgresDB resource.

2. Controller: This is the brains of the operation. It watches for changes to your custom resources and takes action to make the actual state match the desired state. This is called the reconciliation loop - the same pattern that makes Deployments work. "User wants 3 replicas, but there are only 2? Create one more."

Prerequisites

Before we start, make sure you have:

  • Go 1.21+ installed
  • Docker installed
  • kubectl configured and connected to a cluster (minikube works fine)
  • Kubebuilder installed:
# Install Kubebuilder
curl -L -o kubebuilder "https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)"
chmod +x kubebuilder && sudo mv kubebuilder /usr/local/bin/

Step 1: Scaffold the Project

Kubebuilder generates all the boilerplate for us:

mkdir postgres-operator && cd postgres-operator
kubebuilder init --domain thegeekyway.com --repo github.com/thegeekyway/postgres-operator

Now create the API (the CRD and Controller):

kubebuilder create api --group database --version v1alpha1 --kind PostgresDB

When prompted, say y to both "Create Resource" and "Create Controller".

This gives us a project structure like:

postgres-operator/
  api/v1alpha1/
    postgresdb_types.go     # CRD definition
  internal/controller/
    postgresdb_controller.go # Reconciliation logic
  config/
    crd/                     # Generated CRD manifests
    rbac/                    # RBAC permissions
  cmd/
    main.go                  # Entry point

Step 2: Define the Custom Resource

Open api/v1alpha1/postgresdb_types.go. This is where we define what a PostgresDB resource looks like. The scaffolded code has empty spec and status structs. Let's fill them in:

package v1alpha1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// PostgresDBSpec defines the desired state of PostgresDB
type PostgresDBSpec struct {
    // Version of PostgreSQL to deploy
    Version string `json:"version"`

    // Storage size for the database (e.g., "1Gi", "10Gi")
    Storage string `json:"storage"`

    // Number of replicas (1 = primary only, 2+ = primary + replicas)
    Replicas int32 `json:"replicas"`

    // Database name to create on initialization
    DatabaseName string `json:"databaseName"`
}

// PostgresDBStatus defines the observed state of PostgresDB
type PostgresDBStatus struct {
    // Ready indicates whether the database is accepting connections
    Ready bool `json:"ready"`

    // Phase represents the current lifecycle phase
    // Can be: Pending, Creating, Running, Failed
    Phase string `json:"phase"`

    // ConnectionString for applications to connect
    ConnectionString string `json:"connectionString,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Version",type=string,JSONPath=`.spec.version`
//+kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
//+kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
//+kubebuilder:printcolumn:name="Ready",type=boolean,JSONPath=`.status.ready`

// PostgresDB is the Schema for the postgresdbs API
type PostgresDB struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   PostgresDBSpec   `json:"spec,omitempty"`
    Status PostgresDBStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// PostgresDBList contains a list of PostgresDB
type PostgresDBList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []PostgresDB `json:"items"`
}

func init() {
    SchemeBuilder.Register(&PostgresDB{}, &PostgresDBList{})
}

The +kubebuilder comments are markers that Kubebuilder uses to generate the CRD YAML. The printcolumn markers define what shows up when you run kubectl get postgresdbs.

Now regenerate the manifests:

make manifests
make generate

Step 3: Write the Controller

This is the core of our Operator. Open internal/controller/postgresdb_controller.go and let's write the reconciliation logic:

package controller

import (
    "context"
    "fmt"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/api/resource"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    databasev1alpha1 "github.com/thegeekyway/postgres-operator/api/v1alpha1"
)

type PostgresDBReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

//+kubebuilder:rbac:groups=database.thegeekyway.com,resources=postgresdbs,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=database.thegeekyway.com,resources=postgresdbs/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups="",resources=persistentvolumeclaims,verbs=get;list;watch;create;update;patch;delete

func (r *PostgresDBReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // Fetch the PostgresDB resource
    pgdb := &databasev1alpha1.PostgresDB{}
    if err := r.Get(ctx, req.NamespacedName, pgdb); err != nil {
        if errors.IsNotFound(err) {
            // Resource was deleted, nothing to do
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }

    logger.Info("Reconciling PostgresDB", "name", pgdb.Name)

    // Update status to Creating if it's new
    if pgdb.Status.Phase == "" {
        pgdb.Status.Phase = "Creating"
        if err := r.Status().Update(ctx, pgdb); err != nil {
            return ctrl.Result{}, err
        }
    }

    // Ensure the Service exists
    if err := r.ensureService(ctx, pgdb); err != nil {
        return ctrl.Result{}, err
    }

    // Ensure the StatefulSet exists
    if err := r.ensureStatefulSet(ctx, pgdb); err != nil {
        return ctrl.Result{}, err
    }

    // Update status
    pgdb.Status.Phase = "Running"
    pgdb.Status.Ready = true
    pgdb.Status.ConnectionString = fmt.Sprintf(
        "postgresql://%s.%s.svc.cluster.local:5432/%s",
        pgdb.Name, pgdb.Namespace, pgdb.Spec.DatabaseName,
    )
    if err := r.Status().Update(ctx, pgdb); err != nil {
        return ctrl.Result{}, err
    }

    logger.Info("PostgresDB reconciled successfully", "name", pgdb.Name)
    return ctrl.Result{}, nil
}

func (r *PostgresDBReconciler) ensureService(ctx context.Context, pgdb *databasev1alpha1.PostgresDB) error {
    svc := &corev1.Service{}
    err := r.Get(ctx, types.NamespacedName{Name: pgdb.Name, Namespace: pgdb.Namespace}, svc)
    if err == nil {
        return nil // Service already exists
    }
    if !errors.IsNotFound(err) {
        return err
    }

    // Create the Service
    svc = &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      pgdb.Name,
            Namespace: pgdb.Namespace,
        },
        Spec: corev1.ServiceSpec{
            Selector: map[string]string{
                "app":        "postgres",
                "postgresdb": pgdb.Name,
            },
            Ports: []corev1.ServicePort{
                {
                    Port:     5432,
                    Name:     "postgres",
                    Protocol: corev1.ProtocolTCP,
                },
            },
            ClusterIP: "None", // Headless service for StatefulSet
        },
    }

    // Set the owner reference so the Service is garbage collected
    // when the PostgresDB is deleted
    ctrl.SetControllerReference(pgdb, svc, r.Scheme)
    return r.Create(ctx, svc)
}

func (r *PostgresDBReconciler) ensureStatefulSet(ctx context.Context, pgdb *databasev1alpha1.PostgresDB) error {
    sts := &appsv1.StatefulSet{}
    err := r.Get(ctx, types.NamespacedName{Name: pgdb.Name, Namespace: pgdb.Namespace}, sts)
    if err == nil {
        // StatefulSet exists, check if replicas need updating
        if *sts.Spec.Replicas != pgdb.Spec.Replicas {
            sts.Spec.Replicas = &pgdb.Spec.Replicas
            return r.Update(ctx, sts)
        }
        return nil
    }
    if !errors.IsNotFound(err) {
        return err
    }

    // Create the StatefulSet
    labels := map[string]string{
        "app":        "postgres",
        "postgresdb": pgdb.Name,
    }

    storageQuantity := resource.MustParse(pgdb.Spec.Storage)

    sts = &appsv1.StatefulSet{
        ObjectMeta: metav1.ObjectMeta{
            Name:      pgdb.Name,
            Namespace: pgdb.Namespace,
        },
        Spec: appsv1.StatefulSetSpec{
            ServiceName: pgdb.Name,
            Replicas:    &pgdb.Spec.Replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: labels,
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: labels,
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "postgres",
                            Image: fmt.Sprintf("postgres:%s", pgdb.Spec.Version),
                            Ports: []corev1.ContainerPort{
                                {
                                    ContainerPort: 5432,
                                    Name:          "postgres",
                                },
                            },
                            Env: []corev1.EnvVar{
                                {
                                    Name:  "POSTGRES_DB",
                                    Value: pgdb.Spec.DatabaseName,
                                },
                                {
                                    Name:  "POSTGRES_USER",
                                    Value: "postgres",
                                },
                                {
                                    Name:  "POSTGRES_PASSWORD",
                                    Value: "changeme", // In production, use a Secret
                                },
                                {
                                    Name:  "PGDATA",
                                    Value: "/var/lib/postgresql/data/pgdata",
                                },
                            },
                            VolumeMounts: []corev1.VolumeMount{
                                {
                                    Name:      "data",
                                    MountPath: "/var/lib/postgresql/data",
                                },
                            },
                        },
                    },
                },
            },
            VolumeClaimTemplates: []corev1.PersistentVolumeClaim{
                {
                    ObjectMeta: metav1.ObjectMeta{
                        Name: "data",
                    },
                    Spec: corev1.PersistentVolumeClaimSpec{
                        AccessModes: []corev1.PersistentVolumeAccessMode{
                            corev1.ReadWriteOnce,
                        },
                        Resources: corev1.VolumeResourceRequirements{
                            Requests: corev1.ResourceList{
                                corev1.ResourceStorage: storageQuantity,
                            },
                        },
                    },
                },
            },
        },
    }

    ctrl.SetControllerReference(pgdb, sts, r.Scheme)
    return r.Create(ctx, sts)
}

func (r *PostgresDBReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&databasev1alpha1.PostgresDB{}).
        Owns(&appsv1.StatefulSet{}).
        Owns(&corev1.Service{}).
        Complete(r)
}

Let me walk through what's happening here:

  • Reconcile is the heart of the controller. Kubernetes calls this function whenever a PostgresDB resource is created, updated, or deleted. Our job is to make reality match the desired state.
  • ensureService creates a headless Service for the StatefulSet. Headless services give each Pod a stable DNS name, which is important for databases.
  • ensureStatefulSet creates a StatefulSet that runs PostgreSQL. StatefulSets are the right choice for databases because they provide stable network identities and persistent storage.
  • SetupWithManager tells the controller what resources to watch. We watch PostgresDB resources directly, and also watch StatefulSets and Services that we own, so we get notified if someone modifies or deletes them.

The ctrl.SetControllerReference calls are important. They set up owner references so that when a PostgresDB is deleted, Kubernetes automatically garbage collects the Service and StatefulSet.

Step 4: Create a Sample Resource

Create a file config/samples/database_v1alpha1_postgresdb.yaml:

apiVersion: database.thegeekyway.com/v1alpha1
kind: PostgresDB
metadata:
  name: my-postgres
  namespace: default
spec:
  version: "16"
  storage: "1Gi"
  replicas: 1
  databaseName: "myapp"

That's all a user needs to write. No StatefulSet YAML, no Service YAML, no PVC YAML. Just "I want a PostgreSQL 16 instance with 1Gi storage."

Step 5: Run and Test

Let's test our Operator locally against a cluster:

# Install the CRDs into the cluster
make install

# Run the controller locally (for development)
make run

In another terminal, create a PostgresDB:

kubectl apply -f config/samples/database_v1alpha1_postgresdb.yaml

Check what was created:

$ kubectl get postgresdbs
NAME          VERSION   REPLICAS   PHASE     READY
my-postgres   16        1          Running   true

$ kubectl get statefulsets
NAME          READY   AGE
my-postgres   1/1     30s

$ kubectl get services
NAME          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
my-postgres   ClusterIP   None         <none>        5432/TCP   30s

$ kubectl get pvc
NAME                   STATUS   VOLUME   CAPACITY   ACCESS MODES   AGE
data-my-postgres-0     Bound    pv-xxx   1Gi        RWO            30s

From a single 8-line YAML, our Operator created a StatefulSet, a headless Service, and a PersistentVolumeClaim. That's the power of the Operator pattern.

Step 6: Build and Deploy

When you're ready to run the Operator inside the cluster (instead of locally):

# Build the Docker image
make docker-build IMG=your-registry/postgres-operator:v0.1.0

# Push to your registry
make docker-push IMG=your-registry/postgres-operator:v0.1.0

# Deploy to the cluster
make deploy IMG=your-registry/postgres-operator:v0.1.0

The Operator now runs as a Deployment in the postgres-operator-system namespace.

Taking It Further

Our Operator is functional but basic. A production-ready PostgreSQL Operator would include:

Password Management with Secrets

Instead of hardcoding the password, generate a random one and store it in a Kubernetes Secret:

// In your reconciler, before creating the StatefulSet:
secret := &corev1.Secret{
    ObjectMeta: metav1.ObjectMeta{
        Name:      fmt.Sprintf("%s-credentials", pgdb.Name),
        Namespace: pgdb.Namespace,
    },
    StringData: map[string]string{
        "POSTGRES_PASSWORD": generateRandomPassword(24),
    },
}

Then reference the Secret in the StatefulSet's env vars using SecretKeyRef.

Health Checks

Add liveness and readiness probes to the container:

LivenessProbe: &corev1.Probe{
    ProbeHandler: corev1.ProbeHandler{
        Exec: &corev1.ExecAction{
            Command: []string{"pg_isready", "-U", "postgres"},
        },
    },
    InitialDelaySeconds: 30,
    PeriodSeconds:       10,
},

Backup CRD

You could create another CRD called PostgresBackup that triggers pg_dump on a schedule:

apiVersion: database.thegeekyway.com/v1alpha1
kind: PostgresBackup
metadata:
  name: nightly-backup
spec:
  postgresRef: my-postgres
  schedule: "0 2 * * *"
  destination: "s3://my-backups/postgres/"

Status Conditions

Instead of a simple Ready boolean, use proper Kubernetes status conditions to report detailed state:

type PostgresDBStatus struct {
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

This follows Kubernetes conventions and makes your CRD work nicely with tools like kubectl wait.

Operator Frameworks Comparison

Kubebuilder isn't the only option. Here's a quick look at the alternatives:

Framework Language Complexity Best For
Kubebuilder Go Medium Production operators with full Kubernetes API access
Operator SDK Go, Ansible, Helm Medium Broader tooling, includes Kubebuilder under the hood
Kopf Python Low Quick prototyping and simpler operators
Metacontroller Any (webhooks) Low When you don't want to deal with client libraries

If Go isn't your thing, Kopf is a fantastic choice. You can write a basic operator in under 50 lines of Python. But for production workloads where performance and reliability matter, Go with Kubebuilder is the standard.

Summary

Building a Kubernetes Operator boils down to three concepts:

  1. Define a CRD that describes what users want (the "what")
  2. Write a controller that makes it happen (the "how")
  3. Use the reconciliation loop to continuously ensure desired state matches actual state

The PostgreSQL Operator we built is simple, but it demonstrates the core pattern. Every operator - whether it's managing databases, message queues, or ML training jobs - follows this same structure. The reconciliation loop is always: "What does the user want? What exists right now? What do I need to do to bridge the gap?"

The beauty of Operators is that once you encode your operational knowledge into one, it works tirelessly. No more 3 AM pager alerts for things that can be automated. And that, to me, is what good engineering is about.

If you've any questions about building Kubernetes Operators, please let us know in the comments section below.