> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > David Z > Sent: Wednesday, November 12, 2014 8:16 AM > To: Ceph Community; Ceph-users > Subject: The strategy of auto-restarting crashed OSD > > Hi Guys, > > We are experiencing some OSD crashing issues recently, like messenger > crash, some strange crash (still being investigating), etc. Those crashes seems > not to reproduce after restarting OSD. > > So we are thinking about the strategy of auto-restarting crashed OSD for 1 or > 2 times, then leave it as down if restarting doesn't work. This strategy might > help us on pg peering and recovering impact to online traffic to some extent, > since we won't mark OSD out automatically even if it is down unless we are > sure it is disk failure. > > However, we are also aware that this strategy may bring us some problems. > Since your guys have more experience on CEPH, so we would like to hear > some suggestions from you. > > Thanks. > > David Zhang > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com I'm currently looking at the same scenario of having to restart crashed OSDs. I'm looking towards using runit (http://smarden.org/runit/ & http://smarden.org/runit/useinit.html) to manage the OSD's...I'll probably modify my init script to send me a trap or email when it's restarted though. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com