Hi,
what exactly is your question? You seem to have made progress in
bringing OSDs back up and reducing inactive PGs. What is unexpected to
me is that one host failure would cause inactive PGs. Can you share
more details about your osd tree and crush rules of the affected
inactive PGs? Usually, a ceph cluster should be resilient to a host
failure if set up properly. So after your host failed I would have
expected that ceph recovers the degraded PGs to different hosts. Did
the recovery not happen?
Regards,
Eugen
Zitat von Alfredo Rezinovsky <alfrenovsky@xxxxxxxxx>:
I had a problem with a server, hardware completely broken.
"ceph orch rm host" hanged, even with force and offline options
I reinstalled other server with the same IP address and then I removed the
OSD with:
ceph osd purge osd.10
ceph osd purge osd.11
Now I have 0.342% pgs not active
with
ceph pg <pg.id> query
I can see the PG is blocked by a non existent OSD.10 or 11 (in the other
problematic PG)
I already tried setting
osd_find_best_info_ignore_history_les = false
in the intervening OSDs and restarted them with some luck (I had 3 non
active PGs, now I have 2)
Also after that another OSD keeps restarting. Fixed that by setting the
reweight to 0 and still waiting until the OSD is empty to destroy it.
--
Alfrenovsky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx