Many pgs inactive after node failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a 3 node ceph cluster in my home lab. One of the pools spans 3
hdds, one on each node, and has size 2, min size 1. One of my nodes is
currently down, and I have 160 pgs in 'unknown' state. The other 2
hosts are up and the cluster has quorum.

Example `ceph health detail` output:
pg 9.0 is stuck inactive for 25h, current state unknown, last acting []

I have 3 questions:

Why would the pgs be in an unknown state?

I would like to recover the cluster without recovering the failed
node, primarily so that I know I can. Is that possible?

The boot nvme of the host has failed, so I will most likely rebuild
it. I'm running rook, and I will most likely delete the old node and
create a new one with the same name. AFAIK, the OSDs are fine. When
rook rediscovers the OSDs, will it add them back with data intact? If
not, is there any way I can make it so it will?

Thanks!
-- 
Matthew Booth
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux