Hi Rainer, slow ops going away after osd down sounds a little bit like this: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/E6LSKCPXPQS4G3CZUQR6M2BK5SNIZ7PX/. Read errors indicate a dying disk, however, there might be something in common here. Do you have stats of network load? Setting an OSD down or out while it is running will not do anything (octopus and earlier), the OSD marks itself up/in right away again. The subsequent osd destroy should have failed with "osd nnn still up and in". You need to stop the daemon (OSD marks itself as down and then goes down), mark the osd out (now it will stay out) and destroy it. I usually make a ddrescue attempt to have the data in case something goes haywire. This is the manual procedure for non-cephadm deployed clusters. Cephadm has its own way for replacing disks that I cannot say anything about. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Rainer Krienke <krienke@xxxxxxxxxxxxxx> Sent: 07 November 2022 09:20:44 To: ceph-users@xxxxxxx Subject: How to manuall take down an osd Hi, today morning I had osd.77 in my ceph nautilus cluster with 144 OSDs on 9 hosts that seemed to not be working correctly , it caused slow ops: ceph -s cluster: id: 7397a0cf-bfc6-4d25-aabb-be9f6564a13b health: HEALTH_WARN Reduced data availability: 6 pgs inactive, 8 pgs peering 62 slow ops, oldest one blocked for 2703 sec, osd.77 has slow ops Bevore I had installed ubuntu security updates and then rebooted the host with osd.77. Already before rebooting I saw some read errors on the console, so probably the disk of osd.77 is dying. I had a similar cluster behaviour some months ago where also an osd had slow ops and half a day later it died so I could replace the disk. With this history I now wanted to take osd.77 down and out of the cluster to replace the disk. Now I was unsure about how to do this. I thought, this should be corect: # osd down 77 # osd out 77 # osd destroy 77 Would this be in general the right way to prepare a disk replacement? Now strange but good things happened after the: "ceph osd down 77". There was no error running this command, but then "ceph -s" showed still all osds up and in. I had expected that one OSD should be down now, but it wasn't. And even more strange the problems with slow ops from osd.77 are also gone for the moment and the cluster is completely healthy again. Thanks for your help Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312 Web: http://userpages.uni-koblenz.de/~krienke PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx