Thanks Kyle, --I'll look into and try out udev and upstart. -- yes on set "noout", definitely a good idea, until for sure that osd is gone for good. If osd disk is totally gone, Then down-n'-out. Remove from crushmap/Update crushmap. Verify crushmap Then used ceph-deploy to add a replacement osd in the same osd.num Does this sound about right? -Ben -----Original Message----- From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Kyle Bader Sent: Friday, November 15, 2013 12:58 PM To: ceph-users@xxxxxxxxxxxxxx Subject: Re: Today I’ve encountered multiple OSD down and multiple OSD won’t start and OSD disk access “Input/Output” error” > 3).Comment out, #hashtag the bad OSD drives in the “/etc/fstab”. This is unnecessary if your using the provided upstart and udev scripts, OSD data devices will be identified by label and mounted. If you choose not to use the upstart and udev scripts then you should write init scripts that do similar so that you don't have to have /etc/fstab entries. > 3).Login to Ceph Node with bad OSD net/serial/video. I'd put check dmesg somewhere near the top of this section, often if you lose an OSD due to a filesystem hiccup then it will be evident in dmesg output. > 4).Stop only this local Ceph node with “service Ceph stop” You may want to set "noout" depending on whether you expect it to come back online within your "mon osd down out interval" threshold. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com