Hi, We recently removed an osd from our Cepth cluster. Its underlying disk has a hardware issue. We use command: ceph orch osd rm osd_id --zap During the process, sometimes ceph cluster enters warning state with slow ops on this osd. Our rgw also failed to respond to requests and returned 503. We restarted rgw daemon to make it work again. But the same failure occured from time to time. Eventually we noticed that rgw 503 error is a result of osd slow ops. Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with hardware issue won't impact cluster performance & rgw availbility. Is our expectation reasonable? What's the best way to handle osd with hardware failures? Thank you in advance for any comments or suggestions. Best Regards, Mary Zhang _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx