Hi fellow ceph users and developers, we've got into quite strange situation I'm not sure is not a ceph bug.. we have 4 node CEPH cluster with multiple pools. one of them is SATA EC 2+2 pool containting 4x3 10TB drives (one of tham is actually 12TB) one day, after planned downtime of fourth node, we got into strange state where there seemed to be large amount of degraded PGs to recover (we had noout set for the duration of downtime though) the weird thing was, that OSDs of that node seemed to be almost full (ie 80%) while there were almost no PGs on them according to osd df tree leading to backfilltoofull.. after some experimenting, I dropped those and recreated them, but during the recovery, we got into the same state: -31 120.00000 - 112 TiB 81 TiB 80 TiB 36 GiB 456 GiB 31 TiB 72.58 1.06 - root sata-archive -32 30.00000 - 29 TiB 20 TiB 20 TiB 10 GiB 133 GiB 9.5 TiB 67.48 0.99 - host v1a-sata-archive 5 hdd 10.00000 1.00000 9.2 TiB 6.2 TiB 6.1 TiB 3.5 GiB 47 GiB 3.0 TiB 67.78 0.99 171 up osd.5 10 hdd 10.00000 1.00000 9.2 TiB 6.2 TiB 6.2 TiB 3.6 GiB 48 GiB 2.9 TiB 68.06 1.00 171 up osd.10 13 hdd 10.00000 1.00000 11 TiB 7.3 TiB 7.3 TiB 3.2 GiB 38 GiB 3.6 TiB 66.73 0.98 170 up osd.13 -33 30.00000 - 27 TiB 19 TiB 18 TiB 11 GiB 139 GiB 9.0 TiB 67.39 0.99 - host v1b-sata-archive 19 hdd 10.00000 1.00000 9.2 TiB 6.1 TiB 6.1 TiB 3.5 GiB 46 GiB 3.0 TiB 67.11 0.98 171 up osd.19 28 hdd 10.00000 1.00000 9.2 TiB 6.1 TiB 6.0 TiB 3.5 GiB 46 GiB 3.1 TiB 66.44 0.97 170 up osd.28 29 hdd 10.00000 1.00000 9.2 TiB 6.3 TiB 6.2 TiB 3.6 GiB 48 GiB 2.9 TiB 68.61 1.00 171 up osd.29 -34 30.00000 - 27 TiB 19 TiB 19 TiB 11 GiB 143 GiB 8.6 TiB 68.65 1.00 - host v1c-sata-archive 30 hdd 10.00000 1.00000 9.2 TiB 6.3 TiB 6.2 TiB 3.8 GiB 48 GiB 2.8 TiB 68.91 1.01 171 up osd.30 31 hdd 10.00000 1.00000 9.1 TiB 6.3 TiB 6.3 TiB 3.6 GiB 48 GiB 2.8 TiB 69.20 1.01 171 up osd.31 52 hdd 10.00000 1.00000 9.1 TiB 6.2 TiB 6.1 TiB 3.4 GiB 46 GiB 2.9 TiB 67.84 0.99 170 up osd.52 -35 30.00000 - 27 TiB 24 TiB 24 TiB 4.0 GiB 41 GiB 3.5 TiB 87.13 1.27 - host v1d-sata-archive 53 hdd 10.00000 1.00000 9.2 TiB 8.1 TiB 8.0 TiB 1.3 GiB 14 GiB 1.0 TiB 88.54 1.29 81 up osd.53 54 hdd 10.00000 1.00000 9.2 TiB 8.3 TiB 8.2 TiB 1.4 GiB 14 GiB 897 GiB 90.44 1.32 79 up osd.54 55 hdd 10.00000 1.00000 9.1 TiB 7.5 TiB 7.5 TiB 1.3 GiB 13 GiB 1.6 TiB 82.39 1.21 62 up osd.55 the count of pgs on osd 53..55 is less then 1/2 of other OSDs but they are almost full. according to weights, this should not happen.. any idea on why could this be happening or what to check? thanks a lot in advance for hints.. with best regards nikola ciprich -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@xxxxxxxxxxx ------------------------------------- _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx