> but simply on the physical parameter of IOPS-per-TB (a "figure of merit" that > is widely underestimate or ignored) hear hear! > of HDDs, and having enough IOPS-per-TB to sustain both user and admin workload. Even with SATA SSDs I twice had to expand a cluster to meet SLO long before it was nearly full. The SNIA TCO calculator includes a multiplier for number of drives one has to provision for semi-acceptable IOPs. > A couple of legacy Ceph instances I saw in the past had 8TB and > 18TB drives and as they got full the instances basically > congealed (latencies in the several seconds or even dozens of > second range) even under modest user workloads, and anyhow > expensive admin workloads like scrubbing (never mind deep > scrubbing) got behind by a year or two, and rebalancing was > nearly impossible. Again not because of Ceph. Been there, ITSY’d. Fragmentation matters with rotational media, even with op re-ordering within the drive or the driver. > But that is completely different: SSDs have *much* higher IOPS, > even SATA ones, so even large SSDs have enormously better > IOPS-per-TB. And IOPS-per-yourlocalcurrency. Coarse-IU QLC is a bit of a wrinkle depending on workload... >> I would like to point out that there are scale-out storage >> systems that have adopted their architecture for this scenario >> and use large HDDs very well. > > That is *physically impossible* as they just do not have enough > IOPS-per-TB for many "live" workloads. The illusion that they > might work well happens in one of two cases: > > * Either because they have not filled up yet, I saw this with RGW on ultradense HDD toploaders. > or because they > have filled up but only a minuscule subset of the data is in > active use, the IOPS-per-*active*-TB of the user workload is > still good enough. Archival workloads - sure. Sometimes even backups. Even then, prudently-sourced QLC often has superior TCO compared to spinners. > * If the *active data* is mostly read-only and gets cached on a > SSD tier of sufficient size, and admin workload does not > matter. And sometimes when that data active because of full backups, that process effectively flushes the cache to boot. > I have some idea of how Qumulo does things and that is very > unlikely, Ceph is not fundamentally inferior to their design. > Perhaps the workload's anisotropy matches particularly well that > of that particular Qumulo instance: Like a DB that’s column-oriented vs row-oriented? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx