Re: Today I’ve encountered multiple OSD down and multiple OSD won’t start and OSD disk access “Input/Output” error”

"Aquino, BenX O" <benx.o.aquino@xxxxxxxxx> · Sat, 16 Nov 2013 00:10:20 +0000

Thanks Kyle, 
--I'll look into and  try out udev and upstart.
-- yes on set "noout", definitely a good idea, until for sure that osd is gone for good.

If osd disk is totally gone,
Then  down-n'-out.
Remove  from crushmap/Update crushmap.
Verify crushmap
Then used ceph-deploy to add a replacement osd in the same osd.num

Does this sound about right?

-Ben

-----Original Message-----
From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Kyle Bader
Sent: Friday, November 15, 2013 12:58 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Today I’ve encountered multiple OSD down and multiple OSD won’t start and OSD disk access “Input/Output” error”

> 3).Comment out,  #hashtag the bad OSD drives in the “/etc/fstab”.

This is unnecessary if your using the provided upstart and udev scripts, OSD data devices will be identified by label and mounted. If you choose not to use the upstart and udev scripts then you should write init scripts that do similar so that you don't have to have /etc/fstab entries.

>                 3).Login to Ceph Node  with bad OSD net/serial/video.

I'd put check dmesg somewhere near the top of this section, often if you lose an OSD due to a filesystem hiccup then it will be evident in dmesg output.

>  4).Stop only this local Ceph node  with “service Ceph stop”

You may want to set "noout" depending on whether you expect it to come back online within your "mon osd down out interval" threshold.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com