I have an OSD that is causing slow ops, and appears to be backed by a failing drive according to smartctl outputs. I am using cephadm, and wondering what is the best way to remove this drive from the cluster and proper steps to replace the disk? Mark the osd.35 as out. `sudo ceph osd out osd.35` Then mark osd.35 as down. `sudo ceph osd down osd.35` The OSD is marked as out, but it does come back up after a couple of seconds. I do not know if that is a problem or to just let the drive stay online as long as it lasts during the removal from the cluster. After the recovery completes, I would then `destroy` the osd: `ceph osd destroy {id} --yes-i-really-mean-it` (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/) Besides checking steps above, my question now is ..* If the drive is acting very slow and causing slow ops, should I be trying to shut down its OSD and keep it down? There is an example to stop the OSD on the server using systemctl, outside of cephadm:* ssh {osd-host}sudo systemctl stop ceph-osd@{osd-num} Thanks, Matt -- Matt Larson, PhD Madison, WI 53705 U.S.A. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx