Hi Eugen, > thanks for your explanation, Josh. I think In understand now how > mon_max_pg_per_osd could have an impact here. The default seems to be > 250. Each OSD currently has around 100 PGs, is this a potential > bottleneck? It could be, yes. I've seen a case on a test cluster where thousands of PGs were assigned to a single OSD even when the steady state was far fewer than that. > I'll add the rule in question at the bottom, do you see a potential > issue there? It does choose a host, which is similar to the case I had in mind. (Though in my case the OSDs weren't purged and thus the host weight was high, which sounds potentially different from your procedure...) > If I increase mon_max_pg_per_osd temporarily to let's say 500 would > this decrease the risk? That's how I got around this issue in my test env. However, another way to do this would be to not create the OSDs one-by-one at full weight but rather bring them back at 0 weight and then upweight them all bit by bit (or maybe even all at once would work?) to avoid the temporary state. > And draining the OSDs before purging and rebuilding doesn't mean the same can happen again if the OSDs join the > cluster, right? Right, because it's an issue of up-set assignment. Everything above is of course speculation unless you catch this in the act again... Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx