Storage on SKS made easy: Longhorn

Exoscale SKS (Scalable-Kubernetes-Service) enables you to easily scale your application and keep it high-available.

However, let’s say you run your own database inside the cluster: What about storage?

Every node inside the Kubernetes cluster comes with a local disk. That means in theory, you can simply use this storage to save the data of your Pod (aka. a Container in a non Kubernetes world) - as depicted in the next picture.

Exoscale SKS with local storage

But what happens when the Pod or even the whole node fails? A failed Pod will result in Kubernetes rescheduling this Pod either on the same Node (considering it’s still healthy) or some other Node. But when you save the data locally, it will be only available on the specific Node where the data was saved. That means a Pod that is rescheduled on another Node is unable to access it. Also when the Node completely fails, it means your data will be ultimately lost.

Exoscale SKS with a fresh node

That mean’s you can’t utilize the real benefits (high availability, scalability) of Kubernetes this way. Because of data safety, we don’t recommend storing important data (solely) on local disks.

Distributing storage using Longhorn

Longhorn is Open-Source Software that you can install inside your SKS cluster. When creating Kubernetes Volumes, you can choose Longhorn (via a storageclass) as backend. It automatically discovers the disks of all nodes and will distribute and replicate your volume across them. Additionally, it supports snapshots, backups to S3 compatible Object Storage like Exoscale SOS, and disaster recovery across clusters.

Longhorn and Exoscale SKS

The picture above shows the basic functionality. Longhorn automatically discovers the local storage of all nodes and utilizes them for Kubernetes Volumes. That means when you create a volume, it will be replicated 3 times (default setting) and distributed across nodes.

Installation of Longhorn

Installation of Longhorn is straightforward and only takes some minutes, you need a SKS Cluster and access to this cluster via kubectl.

Get the link to the current Longhorn manifest from the Longhorn Docs. Apply the Longhorn manifest this way, replacing VERSION:

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/VERSION/deploy/longhorn.yaml

After some minutes, all Longhorn Pods should be online. You can check this via kubectl get pods -n longhorn-system.

If you have errors, make sure to take a look at kubectl get events -n longhorn-system and kubectl logs PODNAME -n longhorn-system.

Longhorn comes with a user interface. To access it you can use port-forwarding:

kubectl port-forward deployment/longhorn-ui 7000:8000 -n longhorn-system

Use then this URL to access its dashboard: http://127.0.0.1:7000

In the Longhorn UI you can (excerpt):

Create Volumes manually
Backup/Snapshot Volumes
Evict nodes
Configure Longhorn

Creating a Persistent Volume for a Pod

We use a simple example to show how you can use Longhorn.

In theory, you can create a volume manually (in the UI) and then mount it in the manifest of a Pod. In practice, this is done differently: You can automate this procedure by using a PVC (Persistent Volume Claim) to create a PV (Persistent Volume), like in the following example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-test
spec:
  containers:
  - name: container-test
    image: ubuntu
    imagePullPolicy: IfNotPresent
    command:
      - "sleep"
      - "604800"
    volumeMounts:
    - name: volv
      mountPath: /data
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: example-pvc

The container in the Pod defined below the PVC, attaches the volume by using its name (example-pvc) inside the volumes block.

Even when you delete the Pod or the Pod fails, the volume stays intact. It will be deleted when you explicitly delete the PVC.

The PVC has two access modes:

ReadWriteOnce
- The Volume can only be attached by one Pod
ReadWriteMany
- The Volume can be mounted by multiple Pods at the same time

The latter one uses a NFS-layer to achieve the ability to share the volume across Pods. As this comes with a performance penalty, databases-volumes are usually attached using ReadWriteOnce.

The PVC knows to use Longhorn, as we specified longhorn as storageClassName. One can also create a custom storageclass, whereas one can define the number of replicas, the backup-schedule or selectors to only allow Volumes on specific nodes.

Statefulsets

When scaling a database or similar, one often needs a replica paired with one volume. To scale and group multiple Pods of the same kind two concepts exist in Kubernetes - Deployments and Statefulsets.

A Statefulset will enumerate its replicas/Pods, e.g. my-app-0, my-app-1, my-app-2 etc. - my-app-0 will then always be matched with volume-my-app-0.

When using Deployments, its Pods receive a random identifier, that’s why they are not suitable for this use case. Deployments are used when all its Pods should share one ReadWriteMany volume.

The following manifest is an example Statefulset. It will create 3 replicas, and as such 3 individual paired Volumes.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
spec:
  selector:
    matchLabels:
      app: database
  replicas: 3
  serviceName: deployment-test
  template:
    metadata:
      labels:
        app: database
    spec:
      containers:
      - name: database-container
        image: ubuntu
        imagePullPolicy: IfNotPresent
        command:
          - "sleep"
          - "604800"
        volumeMounts:
        - name: database-volume
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: database-volume
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: longhorn
      resources:
        requests:
          storage: 1Gi

We can see the result by showing all Pods and PVCs:

❯ kubectl get pods
NAME                               READY   STATUS        RESTARTS   AGE
database-0                         1/1     Running       0          5m14s
database-1                         1/1     Running       0          4m43s
database-2                         1/1     Running       0          4m17s

❯ kubectl get pvc
NAME                         STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
database-volume-database-0   Bound    pvc-5fc92d99   1Gi        RWO            longhorn       5m19s
database-volume-database-1   Bound    pvc-e797976f   1Gi        RWO            longhorn       4m48s
database-volume-database-2   Bound    pvc-7e4328a0   1Gi        RWO            longhorn       4m22s

Longhorn and Exoscale SKS in practice: Installing a database cluster

Installing a database is really easy using Helm packages!

First, make sure that Longhorn is the default storage provider in your cluster. To do so, you can simply patch the default storageclass (or create a new storageclass with the respective annotation):

kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

To use helm, you need to install the helm CLI on your local computer: Helm Website - https://helm.sh/docs/

In this case, we quickly test setting up a MariaDB Galera Cluster. For that, we use the respective Bitnami helm chart.

Installation is uncomplicated:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/mariadb-galera

This will install the repository of bitnami locally on your computer. Afterwards it will install the mariadb-galera chart, you can replace my-release with an arbitary name.

❯ kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
my-release-mariadb-galera-0   1/1     Running   0          1h
my-release-mariadb-galera-1   1/1     Running   0          1h
my-release-mariadb-galera-2   1/1     Running   0          1h
❯ kubectl get pvc
NAME                               STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-release-mariadb-galera-0   Bound    pvc-622faea9   8Gi        RWO            longhorn       1h
data-my-release-mariadb-galera-1   Bound    pvc-406c8c7e   8Gi        RWO            longhorn       1h
data-my-release-mariadb-galera-2   Bound    pvc-17475177   8Gi        RWO            longhorn       1h

We see now that it created 3 replicas with 3 respective Volumes. When installing the chart, it will show us how to get a console in the database:

kubectl run my-release-mariadb-galera-client --rm --tty -i --restart='Never' --namespace default --image docker.io/bitnami/mariadb-galera:10.5.10-debian-10-r26 --command \
 -- mysql -h my-release-mariadb-galera -P 3306 -uroot -p$(kubectl get secret --namespace default my-release-mariadb-galera -o jsonpath="{.data.mariadb-root-password}" | base64 --decode) my_database

MariaDB [my_database]>

Volumes in the Longhorn UI

You can also scale the created Statefulset up and down. When scaling down, the created Volumes won’t be deleted, as the corresponding PVC still exists.

To connect your applications to it, you can use the internal ClusterIP service:

❯ kubectl get svc
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
my-release-mariadb-galera            ClusterIP      10.100.60.163   <none>           3306/TCP                     1h

And that’s how you can create a whole database cluster inside Kubernetes in less than 15 minutes!

Scheduling backups

Backup is an important topic. That’s why you can use backup to S3 in Longhorn. To connect Longhorn with Exoscale SOS (Simple Object storage) we have a tutorial available here.

As soon as you have configured it, you can create backups of your Volumes, which will be saved separately outside your cluster in an Exoscale storage bucket you created - potentially in a different zone. Also when you create a new cluster and apply the same backup target, then Longhorn will discover all your old backups automatically.

Overview of a Longhorn Volume

To create backups automatically, you have to create a custom storageclass. There you also have the opportunity to change further options like the number of replicas:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: my-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: driver.longhorn.io
parameters:
  dataLocality: "disabled"
  numberOfReplicas: "3"
  staleReplicaTimeout: "30"
  fromBackup: ""
  recurringJobs: '[
    {
      "name":"snap",
      "task":"snapshot",
      "cron":"* */2 * * *",
      "retain":1
    },
    {
      "name":"backup",
      "task":"backup",
      "cron":"* */12, * * *",
      "retain":20
    }
  ]'

This sample config will create a snapshot locally every 2 hours and respectively delete the old one. Additionally, every 12 hours volumes using this storageclass will be backed-up to the object storage bucket; 20 backups will be retained. Using the annotation, we set the storageclass to the default one.

Data locality

Longhorn provides great performance. However, in some cases, a Pod can be scheduled on a node, where none of its attached volumes are available. In this case, the Pod will access the Volume over the network, which can lead to less than ideal conditions.

That’s why (ideally for databases) you can turn on Data Locality. Set dataLocality in your storageclass to best-effort and Longhorn will try to always keep a replica on the same node as the attached Pod.

Updating nodes

To update the nodes in the Kubernetes cluster, one replaces nodes one by one. Longhorn is able to copy the data to new nodes.

Go to the Longhorn UI and select the node with which you want to start. Then click on Edit Node, disable scheduling, and set Eviction Requested to true.

Edit Node, disable scheduling, enable eviction

You will then notice, that the amount of replicas on that node drops to 0. You can then go into the Exoscale web interface or CLI and evict the node from the Exoscale SKS node pool. Type into the console:

exo sks nodepool evict CLUSTERNAME CLUSTERNODEPOOL NODENAME

The node name is the same as shown in Longhorn. Alternatively, you can go into the Exoscale web interface -> SKS -> Your Cluster -> Your Nodepool -> Click on “…” besides the node and then on Evict.

To replace the node, scale the node pool up again with:

exo sks nodepool scale CLUSTERNAME CLUSTERNODEPOOL NODENAME SIZE

As size, use the number of nodes you had before the eviction. To make things quicker, you can also scale up the node pool beforehand. You then can directly evict multiple nodes at once in Longhorn.

Continue with this procedure until all nodes are replaced.