Storage Protocols and Metrics¶
This page provides a practical view on common storage protocols and the metrics you should monitor for sizing and operating storage systems:
Common protocols¶
- iSCSI: Block over IP, common for VMs and databases.
- NFS: Network file system for file sharing among apps.
- SMB/CIFS: File protocol commonly used in Windows environments.
- RBD (Ceph RADOS Block Device): Ceph native block device.
- S3 / Object Storage: Object interface for backups, unstructured data and data lakes.
Key metrics¶
- IOPS (operations/sec): measures number of I/O operations.
- Latency (ms): response time per operation (p99, p95).
- Throughput (MB/s): effective bandwidth for sequential ops.
- Queue depth: queue depths at hosts and controllers.
- Utilization: CPU/Network/Disk utilization on storage nodes.
Measurement best practices¶
- Capture both steady-state and peak patterns.
- Use tools:
fiofor block,rclone/s3benchfor object,iperffor network. - Measure latency percentiles (p50/p95/p99) not only averages.
- Correlate with network/CPU metrics to find bottlenecks.
Operational recommendations¶
- Design headroom for peaks (e.g., +30% IOPS/throughput).
- Avoid oversubscription in critical tiers.
- Use QoS/limits to isolate noisy neighbors.
Quick choice: iSCSI vs NFS vs SMB¶
- Databases/VMs: iSCSI/RBD (block) for latency and queue control; multipath + ALUA enabled.
- Shared apps: NFSv4.1 (pNFS if available) for file workloads or RWX containers.
- End-user shares: SMB with signing/encryption as policy requires.
- Containers RWX: NFS/SMB CSI when POSIX/ACL semantics are needed.
- Containers RWO: RBD/iSCSI CSI for statefulsets and databases.
Restic/Borg with distributed storage (Ceph/MinIO)¶
- Repo: S3 (Ceph RGW/MinIO) with versioning on; separate buckets per environment.
- Concurrency: cap
--limit-upload/--max-repack-sizeto avoid overloading OSDs during prune/compact. - Encryption: manage keys outside the cluster; rotate and test restores regularly.
- Retention:
keep-daily/weekly/monthly; schedulerestic forget --pruneoff-peak. - Health: monthly restore tests into an isolated bucket; measure backend latency/throughput.
Container storage optimization (Kubernetes + CSI)¶
- StorageClasses: per tier (
gold/silver/bronze) with properreclaimPolicy(Retainprod,Deletedev). - Binding:
volumeBindingMode: WaitForFirstConsumerto avoid scheduling on nodes without storage paths. - RWX: NFS/SMB CSI or RWX provisioners; verify
fsGroup/permissions. - Snapshots/clones: define
VolumeSnapshotClassand use clones for fast dev/test. - Topology:
allowedTopologieswith zone/rack labels to prevent cross-rack mounts.
Example StorageClass (block):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gold-rbd
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
pool: rbd-gold
imageFeatures: layering,exclusive-lock,object-map,fast-diff,deep-flatten
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer