Re: Ceph PGs stuck inactive after rebuild node

Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> · Fri, 8 Apr 2022 14:33:34 -0600

Hi Eugen,

> thanks for your explanation, Josh. I think In understand now how
> mon_max_pg_per_osd could have an impact here. The default seems to be
> 250. Each OSD currently has around 100 PGs, is this a potential
> bottleneck?

It could be, yes. I've seen a case on a test cluster where thousands
of PGs were assigned to a single OSD even when the steady state was
far fewer than that.

> I'll add the rule in question at the bottom, do you see a potential
> issue there?

It does choose a host, which is similar to the case I had in mind.
(Though in my case the OSDs weren't purged and thus the host weight
was high, which sounds potentially different from your procedure...)

> If I increase mon_max_pg_per_osd temporarily to let's say 500 would
> this decrease the risk?

That's how I got around this issue in my test env. However, another
way to do this would be to not create the OSDs one-by-one at full
weight but rather bring them back at 0 weight and then upweight them
all bit by bit (or maybe even all at once would work?) to avoid the
temporary state.

> And draining the OSDs before purging and rebuilding doesn't mean the same can happen again if the OSDs join the
> cluster, right?

Right, because it's an issue of up-set assignment.

Everything above is of course speculation unless you catch this in the
act again...

Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx