Cluster under stress - flapping OSDs?

Kristof Coucke <kristof.coucke@xxxxxxxxx> · Mon, 12 Oct 2020 09:29:08 +0200

Hi all,

We're now having trouble over a week with our Ceph cluster.
Short info regarding our situation:
- Original cluster had 10 OSD nodes, each having 16 OSDs
- Expansion was necessary, so another 6 nodes have been added
- Version: 14.2.11

Last week we saw heavily loaded OSD servers, after help here we identified
the disk load being too high due to compaction of the Rocks.db.

Taking the disk offline and running ceph-kvstore-tool bluestore-kv
/var/lib/ceph/osd/xxxx compact does take the load away temporarily as
mentioned.

Most of the new disks still have a weight of 0 as we want to get the system
stable first, but there is something I simply don't understand.

When setting the following flags: noout, norecover, nobackfill and
norebalance prior taking the disk offline for compaction, we still get a
raise in PGs degraded after the OSD is back marked as "up".

This night, a flapping OSD was also temporarily marked as "offline", I
assume because it was heavily loaded, causing again a rise in degraded PGs.

I know that there is a flag "nodown", but I've never used it. Reading the
docs, it states these flags are "temporary" and the blocking action will be
performed anyhow afterwards...

So I have a few questions:
1. Why is the cluster marking PGs as "degraded" and indicating degraded
data redundancy, while this was not the case before? It rises
(2196398/10339524249 objects degraded (0.021%)) and I simply cannot
understand why it keeps going up...
2. The flag "nodown". Can I use this to prevent the flapping... I don't
want to get a deeper mess. As far as I understood, it would help in our
case as the OSDs are heavily used.
3. Is it a good idea to start adding the other disks as well (slowly
increasing their weight)?

Thansk,

Kristof
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx