we're still struggling with our getting our ceph to health_ok. We're
having compounded issues interfering with recovery, as I understand it.
To summarize, we have a cluster of 22 osd nodes running ceph 16.2.x.
About a month back we had one of the OSDs break down (just the OS disk,
but we didn't have a cold spare available, it took a week to get it
fixed). Since the failure of the node, ceph has been repairing the
situation of course, but then it became a problem that our OSDs are
really unevenly balanced (lowest below 50%, highest around 85%). So
whenever a disk fails (and there were 2 since then), the load spreads
over the other OSDs and our fullest OSDs go over the 85% threshold,
slowing down recovery, normal use and rebalancing.
We had issues with degraded PGs, but they weren't being repaired
(because we had turned on the scrubbing during recovery, since we got
messages that lots of PGs weren't being scrubbed in time.
Now there's still one remaining PG degraded because one object is
unfound. The whole error state is taking far too long now and as this is
going on, I was wondering how the balancer wasn't doing its job. Turns
out this is dependent on the cluster being OK or at least not having any
degraded things in it. The balancer hasn't done it's job even though our
cluster was OK for a long time before; we added some 8 nodes a few years
ago and still the newer nodes are having the lowest used OSDs.
Our cluster has about 70-71% usage overall, but with the unbalanced
situation we cannot grow any more. The single node issue (though now
resolved) and ongoing disk failures (we are seeing a handful of OSDs
with read-repaired messages), it looks like we can't get back to health
for a while.
I'm trying to mitigate this by reweighting the fullest OSDs, but the
fuller OSDs keep going over the threshold, while the emptiest OSDs have
plenty of space (just 55% full now).
If you read this far ;-) I'm wondering, can I force repair a PG around
all the restrictions so it doesn't block auto rebalancing?
It seems to me, like that would help, but perhaps there are other things
I can do as well?
(Budget wise, adding more OSD nodes is a bit difficult at the moment...)
Thanks for reading!
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx