Hi Stefan, all daemons are 15.2.15 (I'm considering doing update to 15.2.16 today) > What do you have set as neafull ratio? ceph osd dump |grep nearfull. nearfull is 0.87 > > Do you have the ceph balancer enabled? ceph balancer status { "active": true, "last_optimize_duration": "0:00:00.000538", "last_optimize_started": "Wed Apr 20 13:02:26 2022", "mode": "crush-compat", "optimize_result": "Some objects (0.130412) are degraded; try again later", "plans": [] } > What kind of maintenance was going on? we were replacing failing memory module (according to IPMI log, all errors were corrected though..) > > Are the PGs on those OSDs *way* bigger than on those of the other nodes? > ceph pg ls-by-osd $osd-id and check for bytes (and OMAP bytes). Only > accurate information when PGs have been recently deep-scrubbed. sizes seem to be ~similar (each pg is between 65-75GB), if I count sum of them, it's almost twice as big for osd.5 as for osd.53-osd.55 it hasn't been scrubbed due to ongoing recovery though.. but the OMAP sizes shouldn't make such a difference.. > > In this case the PG backfilltoofull warning(s) might have been correct. > Yesterday though, I had no OSDs close to near full ratio and was getting the > same PG backfilltoofull message ... previously seen due to this bug [1]. I > could fix that by setting upmaps for the affacted PGs to another OSD. warning is correct, but the usage value seems to be wrong.. what comes to my mind, there seem to be a lot of pgs waiting for snaptrims.. I'll keep it snaptrimming for some time and see if usage lowers... > > > > > any idea on why could this be happening or what to check? > > I helps to know what kind of maintenance was going on. Sometimes Ceph PG > mappings are not what you want. There are ways to do maintenance in a more > controlled fashion. the maintenance itself wasn't ceph related, it shouldn't cause any PG movements.. one thing to note, I added SSD volume for all OSD DBs to speed up recovery, but we've hat this problem before that, so I don't think this should be the culprit.. BR nik > > > > > thanks a lot in advance for hints.. > > Gr. Stefan > > [1]: https://tracker.ceph.com/issues/39555 > -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@xxxxxxxxxxx ------------------------------------- _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx