Re: stuck stale+undersized+degraded PG after removing 3 OSDs

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Wed, 5 Jun 2019 13:57:52 -0400

On Wed, Jun 5, 2019 at 1:36 PM Sameh <sameh+ceph-users@xxxxxxxxxxxxxxx> wrote:
>
> Hello cephers,
>
> I was trying to reproduce a production situation involving a stuck stale PG.
>
> While playing with a test cluster, I aggressively removed 3 OSDs at once
> from the cluster. One OSD per host. All pools are size 3.
>
> After re-adding them, I ended up in this situation (PG unfound, or acting on one
> OSD, or another, depending on which command you run):
> https://rentry.co/6zwof
>
> I am not sure of the next steps to unblock this. Marking OSD 11 down didn't
> help.
>
>
> Cheers,

I get this in a lab sometimes, and
do

ceph osd set noout

and reboot the node with the stuck PG.

In production, we remove OSDs one by one.

--
Alex Gorbachev
Intelligent Systems Services Inc.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com