I've seen this happening on a weekly basis when I was running 0.61 branch as well, however after switching to 0.67 branch it has stopped. Perhaps you should try upgrading
Andrei
To: "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>
Sent: Saturday, 28 December, 2013 9:48:12 AM
Subject: One OSD always dieing
Hi,
One of my OSDs are dieing all the time. I rebooted one after one every node and assured that all has the same kernel version and glibc.
I’m using ceph version 0.61.9 (7440dcd135750839fa0f00263f80722ff6f51e90).
Dmesg only shows:
[ 5745.366041] init: ceph-osd (ceph/3) main process (2510) killed by ABRT signal
[ 5745.366235] init: ceph-osd (ceph/3) main process ended, respawning
[ 5763.824298] init: ceph-osd (ceph/3) main process (2991) killed by SEGV signal
Basically every time this shows up in the logs:
2013-12-28 06:35:08.489431 7fc9eccd5700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7fc9eccd5700 time 2013-12-28 06:35:08.487862
osd/ReplicatedPG.cc: 1379: FAILED assert(0)
If you need more infos I will send them. Please help ! The whole cluster isn’t working proberbly because of this…
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com