Removing failing OSD with cephadm?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have an OSD that is causing slow ops, and appears to be backed by a
failing drive according to smartctl outputs.  I am using cephadm, and
wondering what is the best way to remove this drive from the cluster and
proper steps to replace the disk?

Mark the osd.35 as out.

`sudo ceph osd out osd.35`

Then mark osd.35 as down.

`sudo ceph osd down osd.35`

 The OSD is marked as out, but it does come back up after a couple of
seconds.  I do not know if that is a problem or to just let the drive stay
online as long as it lasts during the removal from the cluster.

 After the recovery completes, I would then `destroy` the osd:

`ceph osd destroy {id} --yes-i-really-mean-it`

(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/)

Besides checking steps above, my question now is ..* If the drive is acting
very slow and causing slow ops, should I be trying to shut down its OSD
and keep it down? There is an example to stop the OSD on the server using
systemctl, outside of cephadm:*

ssh {osd-host}sudo systemctl stop ceph-osd@{osd-num}


Thanks,
  Matt

-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux