> We recently started experimenting with Proxmox Backup Server, > which is really cool, but performs enough IO to basically lock > out the VM being backed up, leading to IO timeouts, leading to > user complaints. :-( The two most common things I have had to fix over years as to storage systems I hav inherited have been: * Too low IOPS-per-TB to handle a realistic workload. * Too few total IOPS to handle the user and sysadmin (checking, scrubbing, backup, balancing, backfilling, ...) workloads. Both happen because most sysadmins are heavily incentivized to save money now even if there is a huge price to pay later when the storage capacity fills up. An SSD based storage cluster like the one you have to deal with has plenty of IOPS, so your case is strange, in particular that latencies in your tests are low at the same time as IO rates are low; badly overloaded storage complexes have latencies 1 second and way above. That your test reports small latencies as average but a max latency of 37s and long pauses with 0 IOPS are reported is suspicious. It could be that *some* OSD SSDs are not in good condition and they slow down everything, as the Ceph daemons wait for the slowest OSD to respond. 37s looks like retries on a failing SSD. In an ideal world you would have on the cluster a capacity monitor like Ganglia etc. showing year-long graphs of network bandwidth and IO rates and latencies, but I guess this was not setup like that. > The SSDs are probably 5-8 years old. The OSDs were rebuilt to > bluestore around the luminous timeframe. (Nautilus, maybe. It > was a while ago.) >> Newer SSD controllers / models are better than older models >> at housekeeping over time, so the secure-erase might freshen >> performance. Indeed 5-8 year old firmware may not be as sophisticated as more recent firmware, in particular as to needing periodic explicit TRIMs. As to that I noticed this: >>> Its primary use is serving RBD VM block devices for Proxmox A VM workload, and in particular RBD, involves often very small random writes and "mixed-use SSD"s are not as suitable to that, in particular if the usual and insane practice of having VM operating systems log to virtual disks has been followed. So the physical storage on the SSDs may have become hideously fragmented, thus indeed requiring TRIMs, especially if the endurance levels are low (which is dangerous), and especially if the workload never pauses enough to run the firmware compaction mechanism (which is likely given that the storage complex cannot sustain both the user workload and backups). In particular check the logs of these OSDs to see which specific SSDs are reporting the slowest IOPs >>> 36 slow ops, oldest one blocked for 37 sec, daemons [osd.10,osd.12,osd.13,osd.14,osd.15,osd.17,osd.2,osd.25,osd.28,osd.3]... _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx