Hi,
We had a nasty power failure yesterday and even with UPS's our small (5
node, 12 OSD) cluster is having problems recovering.
We are running ceph 0.87
3 of our OSD's are down consistently (others stop and are restartable,
but our cluster is so slow that almost everything we do times out).
We are seeing errors like this on the OSD's that never run:
ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
Operation not permitted
We are seeing errors like these of the OSD's that run some of the time:
osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
e.version.version)
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
Does anyone have any suggestions on how to recover our cluster?
Thanks!
Jeff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com