Re: Ceph PGs stuck inactive after rebuild node

Eugen Block <eblock@xxxxxx> · Thu, 07 Apr 2022 06:15:18 +0000

Basically, these are the steps to remove all OSDs from that host (OSDs  
are not "replaced" so they aren't marked "destroyed") [1]:

1) Call 'ceph osd out $id'
2) Call systemctl stop ceph-osd@$id
3) ceph osd purge $id --yes-i-really-mean-it
4) call ceph-volume lvm zap --osd-id $id --destroy

After all disks have been wiped there's a salt runner to deploy all  
available OSDs on that host again[2]. All OSDs are created with a  
normal weight. All OSD restarts I did were on different hosts, not on  
the rebuilt host. The only difference I can think of that may have an  
impact is that this cluster consists of two datacenters, the others  
were not devided into several buckets. Could that be an issue?

[1]  
https://github.com/SUSE/DeepSea/blob/master/srv/modules/runners/osd.py#L179
[2] https://github.com/SUSE/DeepSea/blob/master/srv/salt/_modules/dg.py#L1396

Zitat von Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>:

On Wed, Apr 6, 2022 at 11:20 AM Eugen Block <eblock@xxxxxx> wrote:
I'm pretty sure that their cluster isn't anywhere near the limit for
mon_max_pg_per_osd, they currently have around 100 PGs per OSD and the
configs have not been touched, it's pretty basic.

How is the host being "rebuilt"? Depending on the CRUSH rule, if the
host's OSDs are all marked destroyed and then re-created one at a time
with normal weight, CRUSH may decide to put a large number of PGs on
the first OSD that is created, and so on, until the rest of the host's
OSDs are available to take those OSDs.

Josh

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx