Best practice for removing failing host from cluster?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a Ceph cluster running Octopus v 15.2.3 , and 1 of 12 of the hosts
in the cluster has started having what appears to be a hardware issue
causing it to freeze.  This began with a freeze and reported 'CATERR' in
the server logs. The host has been having repeated freeze issues over the
last week.

I'm looking to safely isolate this host from the cluster while
troubleshooting.  I started trying to remove OSDs from the host with `ceph
orch osd rm XX` for one of the drives on the node to rebalance the data
from the host.  The host is now having difficulties remaining online for
extended periods of time, and so I was planning to remove this host from
the cluster / to remove all the remaining OSDs from the node.  What would
be the best way to do this?

Should I use `ceph orch osd rm XX` for each of the OSDs of this host
or should I set the weights of each of the OSDs as 0?  Can I do this while
the host is offline, or should I bring it online first before setting
weights or using `ceph orch osd rm`?

Thanks,
  Matt

-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux