OSD PGs are not being removed - Full OSD issues

"Philippe D'Anjou" <danjou.philippe@xxxxxxxx> · Thu, 17 Oct 2019 06:33:03 +0000 (UTC)

This is related to https://tracker.ceph.com/issues/42341 and to http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037017.html

After closing inspection yesterday we found that PGs are not being removed from OSDs which then leads to near full errors, explains why reweights don't work. This is a BIG issue because I have to constantly manually intervene to not have the cluster die.
14.2.4. Fresh Setup, all default
PG Balancer is turned off now, I begin to wonder if its at fault.

My crush map: https://termbin.com/3t8l
What was mentioned that the bucket weights are WEIRD. I never touched this.
The crush weights that are unsual are for nearfull osd53 and some are set to 10 from a previous manual intervention.

Now that the PGs are not being purged is one issue, the original issue is why the f ceph fills ONLY my nearfull OSDs in the first place. It seems to always select the fullest OSD to write more data onto it. If I reweight it it starts giving alerts for another almost full OSD because it intends to write everything there, despite everything else being only at about 60%.

I dont know how to debug this, it's a MAJOR PITA

Hope someone has an idea because I can't fight this 24/7, I'm getting pretty tired of this

Thanks
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com