Re: inactive PGs looking for a non existent OSD

Eugen Block <eblock@xxxxxx> · Thu, 27 Jul 2023 10:10:51 +0000

Hi,

what exactly is your question? You seem to have made progress in  
bringing OSDs back up and reducing inactive PGs. What is unexpected to  
me is that one host failure would cause inactive PGs. Can you share  
more details about your osd tree and crush rules of the affected  
inactive PGs? Usually, a ceph cluster should be resilient to a host  
failure if set up properly. So after your host failed I would have  
expected that ceph recovers the degraded PGs to different hosts. Did  
the recovery not happen?

Regards,
Eugen

Zitat von Alfredo Rezinovsky <alfrenovsky@xxxxxxxxx>:

I had a problem with a server, hardware completely broken.

"ceph orch rm host"  hanged, even with force and offline options

I reinstalled other server with the same IP address and then I removed the
OSD with:

ceph osd purge osd.10
ceph osd purge osd.11

Now I have 0.342% pgs not active

with

ceph pg <pg.id> query

I can see the PG is blocked by a non existent OSD.10 or 11 (in the other
problematic PG)

I already tried setting

osd_find_best_info_ignore_history_les = false

in the intervening OSDs and restarted them with some luck (I had 3 non
active PGs, now I have 2)

Also after that another OSD keeps restarting. Fixed that by setting the
reweight to 0 and still waiting until the OSD is empty to destroy it.

--
Alfrenovsky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx