Procedure when an OSD is down or Error encountered during Ceph status checks : Ceph version 0.67.4 1).Is the Cluster just started and has not complete starting OSD’s. 2).Ensure continues Hard Access to the Ceph Node:
-either via HW serial console server and serial console redirect.
-by Video-Over-Net infrastructure. 3).Comment out, #hashtag the bad OSD drives in the “/etc/fstab”. This is to prevent boot stoppages of the Ceph node due bad osd drive. if its automatically/intentionally/un-intentionally rebooted.
3).Login to Ceph Node with bad OSD net/serial/video. - Check OSD Data Directory for identity “whoami”. --can you do ls on OSD Data Directory --can you unmount and remount OSD disk to OSD data directory “var/lib/ceph/osd/Ceph-X/”. --“ do you get “input/output” error for having an osd bad disk or filesystem error. -- if no, then try to start osd.X , “ service ceph start osd.X” --if yes on get “input/output” error , continue …. 4).Stop only this local Ceph node with “service Ceph stop” 5).#reboot Ceph node. 6).Can you unmounts and remount OSD disk to OSD data directory “var/lib/Ceph/osd/ceph-X/”. -- if yes, undo remove #hashtag on the OSD drives in the “/etc/fstab”. --then try to start all Ceph local OSD’s “service ceph start ”. ---in some OSD trouble cases, its resolve here. 8)If OSD still does not work. Comment out, #hashtag the bad OSD drives in the “/etc/fstab” --then start all Ceph local OSD’s “ service ceph start ” -- then stop bad osd.X , “ service ceph start osd.X” 10)Ensure that all other working OSD’s from this Ceph node are present in the Cluster. By “running “ceph osd tree” and “service ceph –a status” 9) Procedure to replace a bad disk OSD . … to be continued….. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com