Skip to content

Storage Protocols and Metrics

This page provides a practical view on common storage protocols and the metrics you should monitor for sizing and operating storage systems:

Common protocols

  • iSCSI: Block over IP, common for VMs and databases.
  • NFS: Network file system for file sharing among apps.
  • SMB/CIFS: File protocol commonly used in Windows environments.
  • RBD (Ceph RADOS Block Device): Ceph native block device.
  • S3 / Object Storage: Object interface for backups, unstructured data and data lakes.

Key metrics

  • IOPS (operations/sec): measures number of I/O operations.
  • Latency (ms): response time per operation (p99, p95).
  • Throughput (MB/s): effective bandwidth for sequential ops.
  • Queue depth: queue depths at hosts and controllers.
  • Utilization: CPU/Network/Disk utilization on storage nodes.

Measurement best practices

  • Capture both steady-state and peak patterns.
  • Use tools: fio for block, rclone/s3bench for object, iperf for network.
  • Measure latency percentiles (p50/p95/p99) not only averages.
  • Correlate with network/CPU metrics to find bottlenecks.

Operational recommendations

  • Design headroom for peaks (e.g., +30% IOPS/throughput).
  • Avoid oversubscription in critical tiers.
  • Use QoS/limits to isolate noisy neighbors.

Quick choice: iSCSI vs NFS vs SMB

  • Databases/VMs: iSCSI/RBD (block) for latency and queue control; multipath + ALUA enabled.
  • Shared apps: NFSv4.1 (pNFS if available) for file workloads or RWX containers.
  • End-user shares: SMB with signing/encryption as policy requires.
  • Containers RWX: NFS/SMB CSI when POSIX/ACL semantics are needed.
  • Containers RWO: RBD/iSCSI CSI for statefulsets and databases.

Restic/Borg with distributed storage (Ceph/MinIO)

  • Repo: S3 (Ceph RGW/MinIO) with versioning on; separate buckets per environment.
  • Concurrency: cap --limit-upload/--max-repack-size to avoid overloading OSDs during prune/compact.
  • Encryption: manage keys outside the cluster; rotate and test restores regularly.
  • Retention: keep-daily/weekly/monthly; schedule restic forget --prune off-peak.
  • Health: monthly restore tests into an isolated bucket; measure backend latency/throughput.

Container storage optimization (Kubernetes + CSI)

  • StorageClasses: per tier (gold/silver/bronze) with proper reclaimPolicy (Retain prod, Delete dev).
  • Binding: volumeBindingMode: WaitForFirstConsumer to avoid scheduling on nodes without storage paths.
  • RWX: NFS/SMB CSI or RWX provisioners; verify fsGroup/permissions.
  • Snapshots/clones: define VolumeSnapshotClass and use clones for fast dev/test.
  • Topology: allowedTopologies with zone/rack labels to prevent cross-rack mounts.

Example StorageClass (block):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
    name: gold-rbd
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    pool: rbd-gold
    imageFeatures: layering,exclusive-lock,object-map,fast-diff,deep-flatten
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer