This is related to https://tracker.ceph.com/issues/42341 and to http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037017.html
After closing inspection yesterday we found that PGs are not being removed from OSDs which then leads to near full errors, explains why reweights don't work. This is a BIG issue because I have to constantly manually intervene to not have the cluster die.
14.2.4. Fresh Setup, all default
PG Balancer is turned off now, I begin to wonder if its at fault.
My crush map: https://termbin.com/3t8l
What was mentioned that the bucket weights are WEIRD. I never touched this.
The crush weights that are unsual are for nearfull osd53 and some are set to 10 from a previous manual intervention.
Now that the PGs are not being purged is one issue, the original issue is why the f ceph fills ONLY my nearfull OSDs in the first place. It seems to always select the fullest OSD to write more data onto it. If I reweight it it starts giving alerts for another almost full OSD because it intends to write everything there, despite everything else being only at about 60%.
I dont know how to debug this, it's a MAJOR PITA
Hope someone has an idea because I can't fight this 24/7, I'm getting pretty tired of this
Thanks
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com