Re: Ceph PGs stuck inactive after rebuild node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen,

> thanks for your explanation, Josh. I think In understand now how
> mon_max_pg_per_osd could have an impact here. The default seems to be
> 250. Each OSD currently has around 100 PGs, is this a potential
> bottleneck?

It could be, yes. I've seen a case on a test cluster where thousands
of PGs were assigned to a single OSD even when the steady state was
far fewer than that.

> I'll add the rule in question at the bottom, do you see a potential
> issue there?

It does choose a host, which is similar to the case I had in mind.
(Though in my case the OSDs weren't purged and thus the host weight
was high, which sounds potentially different from your procedure...)

> If I increase mon_max_pg_per_osd temporarily to let's say 500 would
> this decrease the risk?

That's how I got around this issue in my test env. However, another
way to do this would be to not create the OSDs one-by-one at full
weight but rather bring them back at 0 weight and then upweight them
all bit by bit (or maybe even all at once would work?) to avoid the
temporary state.

> And draining the OSDs before purging and rebuilding doesn't mean the same can happen again if the OSDs join the
> cluster, right?

Right, because it's an issue of up-set assignment.

Everything above is of course speculation unless you catch this in the
act again...

Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux