Re: Ceph PGs stuck inactive after rebuild node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just to update this thread, apparently you were right, we did hit the limit of mon_max_pg_per_osd * osd_max_pg_per_osd_hard_ratio (250 * 3 = 750), this was found in the logs:

2022-04-06 14:24:55.256 7f8bb5a0e700 1 osd.8 43377 maybe_wait_for_max_pg withhold creation of pg 75.56s16: 750 >= 750

This message first came up for the last up OSD on that host after all other OSDs were purged and then again for the first up after the rebuild. I'm currently playing around with the osdmaptool, I have a feeling that this could also be an issue in newer releases, but that is just speculation at the moment. As a workaround we'll increase osd_max_pg_per_osd_hard_ratio to 5 and see how the next attempt will go.

Thanks,
Eugen


Zitat von Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>:

On Wed, Apr 6, 2022 at 11:20 AM Eugen Block <eblock@xxxxxx> wrote:
I'm pretty sure that their cluster isn't anywhere near the limit for
mon_max_pg_per_osd, they currently have around 100 PGs per OSD and the
configs have not been touched, it's pretty basic.

How is the host being "rebuilt"? Depending on the CRUSH rule, if the
host's OSDs are all marked destroyed and then re-created one at a time
with normal weight, CRUSH may decide to put a large number of PGs on
the first OSD that is created, and so on, until the rest of the host's
OSDs are available to take those OSDs.

Josh


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux