> After you have filled that up, if such a host crashes or needs > maintenance, another 80-100TB will need recreating from the other huge > drives. A judicious setting of mon_osd_down_out_subtree_limit can help mitigate the thundering herd FWIW. > I don't think there are specific limitations on the size itself, but > as the single drive becomes larger and larger, just adding a new host > or a drive will mean the cluster is rebalancing for days or weeks if > not more. Especially if EC is used. A corollary here is that IOPS/TB decrease as HDDs grow larger. We see some incremental tweaks, but in the end the interface speed hasn’t grown in some time. Seek and rotational latency are helped somewhat by increasing areal density, though capacity growth is also acheived by making the platters increasingly thinner and more numerous: recent drives pack as many as 9 in there (perhaps fewer for SMR models). I’ve seen scale deployments cap HDD size at, say, 8TB because the IOPS/TB beyond was increasingly untenable, depending of course on the use-case. > At some point you would end up having the cluster almost never in > HEALTH_OK state because of normal replacements, expansions and other > surprises With recent releases backfill doesn’t trigger HEALTH_WARN, though, right? > which in turn could cause secondary problems with mon DBs > and things like that. Your point is well made, though — Dan @ CERN observed several years ago that with a sufficently large cluster one has to come to terms with backfill going on all the time. The idea here is that mon DB compaction tends to block if there is any degradation — with at least some releases, that means even if you have HEALTH_OK, but some OSDs are down/out. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx