Re: Many pgs inactive after node failure

Matthew Booth <mbooth@xxxxxxxxxx> · Mon, 6 Nov 2023 10:25:45 +0000

On Sun, 5 Nov 2023 at 10:05, Eugen Block <eblock@xxxxxx> wrote:
>
> Hi,
>
> this is another example why min_size 1/size 2 are a bad choice (if you
> value your data). There have been plenty discussions on this list
> about that, I'm not going into detail about that. I'm not familiar
> with rook, but activating existing OSDs usually works fine [1].

Just to follow up on this. I've known for some time that replicas
2/min_size 1 was bad and it's been on my todo list to fix it, but it
never seems to rise to the top. Having recovered my data, I have now
fixed it! Or at least I've reconfigured the affected pools and it will
be fixed when the backfill completes in a day or so.

The nvme popped after an unexpected power failure. The HDDs on the
other 2 nodes are external and... I didn't notice that the enclosures
hadn't switched on because the nodes were up and the SSDs were
available. I had NO osds up in this pool and I hadn't noticed 🤦 When
I switched the HDD enclosures on, the OSDs came up and all pgs left
the unknown state and became 'down'. I still don't fully understand
why that is, but as I've switched to replicas 3/min_size 2 I'm hoping
it won't come up again.

I rebuilt the node with the failed NVMe. Rook detected the existing
OSDs as expected and re-added them to the cluster. All pgs became
available immediately and once the cluster was healthy I reconfigured
the pools with replicas 2 after checking I had enough space.

Thanks!

Matt

>
> Regards,
> Eugen
>
> [1] https://docs.ceph.com/en/reef/cephadm/services/osd/#activate-existing-osds
>
> Zitat von Matthew Booth <mbooth@xxxxxxxxxx>:
>
> > I have a 3 node ceph cluster in my home lab. One of the pools spans 3
> > hdds, one on each node, and has size 2, min size 1. One of my nodes is
> > currently down, and I have 160 pgs in 'unknown' state. The other 2
> > hosts are up and the cluster has quorum.
> >
> > Example `ceph health detail` output:
> > pg 9.0 is stuck inactive for 25h, current state unknown, last acting []
> >
> > I have 3 questions:
> >
> > Why would the pgs be in an unknown state?
> >
> > I would like to recover the cluster without recovering the failed
> > node, primarily so that I know I can. Is that possible?
> >
> > The boot nvme of the host has failed, so I will most likely rebuild
> > it. I'm running rook, and I will most likely delete the old node and
> > create a new one with the same name. AFAIK, the OSDs are fine. When
> > rook rediscovers the OSDs, will it add them back with data intact? If
> > not, is there any way I can make it so it will?
> >
> > Thanks!
> > --
> > Matthew Booth
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Matthew Booth
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx