Re: [Ceph incident] PG stuck in peering.

Frank Schilder <frans@xxxxxx> · Thu, 26 Sep 2024 09:34:50 +0000

Hi Loan,

thanks for the detailed post-mortem to the list!

I misread your first message, unfortunately. On our cluster we also had issues with 1-2 PGs being stuck in peering resulting in blocked IO and warnings piling up. We identified the "bad" OSD by shutting one member-OSD down at a time and setting it out, so it was in state down+out. As soon as the bad OSD was down+out, the PG recovered and became active. In our case the disks were bad and we replaced them.

I thought you had done that, but after re-reading it was restarts only, which will not force a remapping. Sorry for the confusion and hopefully our experience reports here help other users.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx