Hi,
today morning I had osd.77 in my ceph nautilus cluster with 144 OSDs on
9 hosts that seemed to not be working correctly , it caused slow ops:
ceph -s
cluster:
id: 7397a0cf-bfc6-4d25-aabb-be9f6564a13b
health: HEALTH_WARN
Reduced data availability: 6 pgs inactive, 8 pgs peering
62 slow ops, oldest one blocked for 2703 sec, osd.77 has
slow ops
Bevore I had installed ubuntu security updates and then rebooted the
host with osd.77. Already before rebooting I saw some read errors on the
console, so probably the disk of osd.77 is dying. I had a similar
cluster behaviour some months ago where also an osd had slow ops and
half a day later it died so I could replace the disk.
With this history I now wanted to take osd.77 down and out of the
cluster to replace the disk. Now I was unsure about how to do this. I
thought, this should be corect:
# osd down 77
# osd out 77
# osd destroy 77
Would this be in general the right way to prepare a disk replacement?
Now strange but good things happened after the: "ceph osd down 77".
There was no error running this command, but then "ceph -s" showed still
all osds up and in. I had expected that one OSD should be down now, but
it wasn't.
And even more strange the problems with slow ops from osd.77 are also
gone for the moment and the cluster is completely healthy again.
Thanks for your help
Rainer
--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx