Re: Many pgs inactive after node failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this is another example why min_size 1/size 2 are a bad choice (if you value your data). There have been plenty discussions on this list about that, I'm not going into detail about that. I'm not familiar with rook, but activating existing OSDs usually works fine [1].

Regards,
Eugen

[1] https://docs.ceph.com/en/reef/cephadm/services/osd/#activate-existing-osds

Zitat von Matthew Booth <mbooth@xxxxxxxxxx>:

I have a 3 node ceph cluster in my home lab. One of the pools spans 3
hdds, one on each node, and has size 2, min size 1. One of my nodes is
currently down, and I have 160 pgs in 'unknown' state. The other 2
hosts are up and the cluster has quorum.

Example `ceph health detail` output:
pg 9.0 is stuck inactive for 25h, current state unknown, last acting []

I have 3 questions:

Why would the pgs be in an unknown state?

I would like to recover the cluster without recovering the failed
node, primarily so that I know I can. Is that possible?

The boot nvme of the host has failed, so I will most likely rebuild
it. I'm running rook, and I will most likely delete the old node and
create a new one with the same name. AFAIK, the OSDs are fine. When
rook rediscovers the OSDs, will it add them back with data intact? If
not, is there any way I can make it so it will?

Thanks!
--
Matthew Booth
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux