Should I infer from the silence that there is no way to recover from the "FAILED assert(last_e.version.version < e.version.version)" errors? Thanks, Jeff ----- Forwarded message from Jeff <jeff@xxxxxxxxxxxxxxxxxxx> ----- Date: Tue, 17 Feb 2015 09:16:33 -0500 From: Jeff <jeff@xxxxxxxxxxxxxxxxxxx> To: ceph-users@xxxxxxxxxxxxxx Subject: Re: Power failure recovery woes Some additional information/questions: Here is the output of "ceph osd tree" Some of the "down" OSD's are actually running, but are "down". For example osd.1: root 30158 8.6 12.7 1542860 781288 ? Ssl 07:47 4:40 /usr/bin/ceph-osd --cluster=ceph -i 0 -f Is there any way to get the cluster to recognize them as being up? osd-1 has the "FAILED assert(last_e.version.version < e.version.version)" errors. Thanks, Jeff # id weight type name up/down reweight -1 10.22 root default -2 2.72 host ceph1 0 0.91 osd.0 up 1 1 0.91 osd.1 down 0 2 0.9 osd.2 down 0 -3 1.82 host ceph2 3 0.91 osd.3 down 0 4 0.91 osd.4 down 0 -4 2.04 host ceph3 5 0.68 osd.5 up 1 6 0.68 osd.6 up 1 7 0.68 osd.7 up 1 8 0.68 osd.8 down 0 -5 1.82 host ceph4 9 0.91 osd.9 up 1 10 0.91 osd.10 down 0 -6 1.82 host ceph5 11 0.91 osd.11 up 1 12 0.91 osd.12 up 1 On 2/17/2015 8:28 AM, Jeff wrote: > > > -------- Original Message -------- > Subject: Re: Power failure recovery woes > Date: 2015-02-17 04:23 > From: Udo Lembke <ulembke@xxxxxxxxxxxx> > To: Jeff <jeff@xxxxxxxxxxxxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx > > Hi Jeff, > is the osd /var/lib/ceph/osd/ceph-2 mounted? > > If not, does it helps, if you mounted the osd and start with > service ceph start osd.2 > ?? > > Udo > > Am 17.02.2015 09:54, schrieb Jeff: >> Hi, >> >> We had a nasty power failure yesterday and even with UPS's our small (5 >> node, 12 OSD) cluster is having problems recovering. >> >> We are running ceph 0.87 >> >> 3 of our OSD's are down consistently (others stop and are restartable, >> but our cluster is so slow that almost everything we do times out). >> >> We are seeing errors like this on the OSD's that never run: >> >> ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) >> Operation not permitted >> >> We are seeing errors like these of the OSD's that run some of the time: >> >> osd/PGLog.cc: 844: FAILED assert(last_e.version.version < >> e.version.version) >> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide >> timeout") >> >> Does anyone have any suggestions on how to recover our cluster? >> >> Thanks! >> Jeff >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ----- End forwarded message ----- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com