Today I’ve encountered multiple OSD down and multiple OSD won’t start and OSD disk access “Input/Output” error”

"Aquino, BenX O" <benx.o.aquino@xxxxxxxxx> · Fri, 15 Nov 2013 20:18:46 +0000

Procedure when an OSD is down or Error encountered during Ceph status checks :
Ceph version 0.67.4
1).Is the Cluster just started and has not complete starting OSD’s.
2).Ensure continues Hard Access to the Ceph Node:

-either via HW serial console server and serial console redirect. 

-by Video-Over-Net infrastructure.

3).Comment out,  #hashtag the bad OSD drives in the “/etc/fstab”.
               This is to prevent boot stoppages of the Ceph node due bad osd drive.
               if its automatically/intentionally/un-intentionally rebooted.

                3).Login to Ceph Node  with bad OSD net/serial/video.
                - Check  OSD Data Directory for identity “whoami”.
                --can you do ls on OSD Data Directory
                --can you unmount and remount OSD disk to OSD data directory “var/lib/ceph/osd/Ceph-X/”.
                --“ do  you get “input/output” error for having an osd bad disk or filesystem error.
                -- if no,  then try to start osd.X  , “ service ceph start osd.X”
                 --if yes on get “input/output” error , continue ….

 4).Stop only this local Ceph node  with “service Ceph stop”
                 5).#reboot Ceph node.
                6).Can you unmounts and remount OSD disk to OSD data directory “var/lib/Ceph/osd/ceph-X/”.
               -- if yes,  undo remove #hashtag on the OSD drives in the “/etc/fstab”.
--then try to start all Ceph local OSD’s  “service ceph start ”.
               ---in some OSD trouble cases, its resolve here.

8)If OSD still does not work. 

  Comment out, #hashtag the bad OSD drives in the “/etc/fstab”
--then start all Ceph local OSD’s  “ service ceph start ”
-- then stop bad  osd.X  , “ service ceph start osd.X”

10)Ensure that all other working OSD’s from this Ceph node  are present in the Cluster.
By “running “ceph osd tree” and “service ceph –a status”

9) Procedure to replace a bad disk OSD  . … to be continued…..

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com