On 4/20/22 12:34, Nikola Ciprich wrote:
Hi fellow ceph users and developers,
we've got into quite strange situation I'm not sure is
not a ceph bug..
we have 4 node CEPH cluster with multiple pools. one of them
is SATA EC 2+2 pool containting 4x3 10TB drives (one of tham
is actually 12TB)
one day, after planned downtime of fourth node, we got into strange
state where there seemed to be large amount of degraded PGs
to recover (we had noout set for the duration of downtime though)
the weird thing was, that OSDs of that node seemed to be almost full (ie
80%) while there were almost no PGs on them according to osd df tree
leading to backfilltoofull..
after some experimenting, I dropped those and recreated them, but during
the recovery, we got into the same state:
-31 120.00000 - 112 TiB 81 TiB 80 TiB 36 GiB 456 GiB 31 TiB 72.58 1.06 - root sata-archive
-32 30.00000 - 29 TiB 20 TiB 20 TiB 10 GiB 133 GiB 9.5 TiB 67.48 0.99 - host v1a-sata-archive
5 hdd 10.00000 1.00000 9.2 TiB 6.2 TiB 6.1 TiB 3.5 GiB 47 GiB 3.0 TiB 67.78 0.99 171 up osd.5
10 hdd 10.00000 1.00000 9.2 TiB 6.2 TiB 6.2 TiB 3.6 GiB 48 GiB 2.9 TiB 68.06 1.00 171 up osd.10
13 hdd 10.00000 1.00000 11 TiB 7.3 TiB 7.3 TiB 3.2 GiB 38 GiB 3.6 TiB 66.73 0.98 170 up osd.13
-33 30.00000 - 27 TiB 19 TiB 18 TiB 11 GiB 139 GiB 9.0 TiB 67.39 0.99 - host v1b-sata-archive
19 hdd 10.00000 1.00000 9.2 TiB 6.1 TiB 6.1 TiB 3.5 GiB 46 GiB 3.0 TiB 67.11 0.98 171 up osd.19
28 hdd 10.00000 1.00000 9.2 TiB 6.1 TiB 6.0 TiB 3.5 GiB 46 GiB 3.1 TiB 66.44 0.97 170 up osd.28
29 hdd 10.00000 1.00000 9.2 TiB 6.3 TiB 6.2 TiB 3.6 GiB 48 GiB 2.9 TiB 68.61 1.00 171 up osd.29
-34 30.00000 - 27 TiB 19 TiB 19 TiB 11 GiB 143 GiB 8.6 TiB 68.65 1.00 - host v1c-sata-archive
30 hdd 10.00000 1.00000 9.2 TiB 6.3 TiB 6.2 TiB 3.8 GiB 48 GiB 2.8 TiB 68.91 1.01 171 up osd.30
31 hdd 10.00000 1.00000 9.1 TiB 6.3 TiB 6.3 TiB 3.6 GiB 48 GiB 2.8 TiB 69.20 1.01 171 up osd.31
52 hdd 10.00000 1.00000 9.1 TiB 6.2 TiB 6.1 TiB 3.4 GiB 46 GiB 2.9 TiB 67.84 0.99 170 up osd.52
-35 30.00000 - 27 TiB 24 TiB 24 TiB 4.0 GiB 41 GiB 3.5 TiB 87.13 1.27 - host v1d-sata-archive
53 hdd 10.00000 1.00000 9.2 TiB 8.1 TiB 8.0 TiB 1.3 GiB 14 GiB 1.0 TiB 88.54 1.29 81 up osd.53
54 hdd 10.00000 1.00000 9.2 TiB 8.3 TiB 8.2 TiB 1.4 GiB 14 GiB 897 GiB 90.44 1.32 79 up osd.54
55 hdd 10.00000 1.00000 9.1 TiB 7.5 TiB 7.5 TiB 1.3 GiB 13 GiB 1.6 TiB 82.39 1.21 62 up osd.55
the count of pgs on osd 53..55 is less then 1/2 of other OSDs but they are almost full. according
to weights, this should not happen..
What Ceph version are you running? ceph versions
What do you have set as neafull ratio? ceph osd dump |grep nearfull.
Do you have the ceph balancer enabled? ceph balancer status
What kind of maintenance was going on?
Are the PGs on those OSDs *way* bigger than on those of the other nodes?
ceph pg ls-by-osd $osd-id and check for bytes (and OMAP bytes). Only
accurate information when PGs have been recently deep-scrubbed.
In this case the PG backfilltoofull warning(s) might have been correct.
Yesterday though, I had no OSDs close to near full ratio and was getting
the same PG backfilltoofull message ... previously seen due to this bug
[1]. I could fix that by setting upmaps for the affacted PGs to another OSD.
any idea on why could this be happening or what to check?
I helps to know what kind of maintenance was going on. Sometimes Ceph PG
mappings are not what you want. There are ways to do maintenance in a
more controlled fashion.
thanks a lot in advance for hints..
Gr. Stefan
[1]: https://tracker.ceph.com/issues/39555
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx