OSDs crashing frequently

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello fellow Cephers!

Recently, before and after the update from 0.77 to 0.78, about half the OSDs in my cluster crash quite frequently with 'osd/PG.cc: 5255: FAILED assert(0 == "we got a bad state machine event")'

I'm not sure if this is a bug (there are some similar-sounding reports in Redmine already), or a configuration/corruption issue on my cluster.

I've got 22 OSDs on 5 hosts, running 0.78 across the board. Any pointers would be appreciated! I'd like to track down and resolve the issue, it's causing a lot of stalled requests from clients and seems like a generally-unhealthy state of being.

Here's a fresh log file (~3MiB) from one OSD that crashed (old log moved aside before restarting after crash):
http://www.aarontc.com/logs/ceph-osd.4.log

Thanks,
-Aaron

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux