Re: One OSD always dieing

"Rottmann, Jonas (centron GmbH)" <J.Rottmann@xxxxxxxxxx> · Wed, 15 Jan 2014 09:49:21 +0000

Hi,

I now did an upgrade to dumpling (ceph version 0.67.5 (a60ac9194718083a4b6a225fc17cad6096c69bd1)), but the osd still fails at startup with a trace.

Heres the trace:

http://paste.ubuntu.com/6755307/

If you need any more infos I will provide them. Can someone please help?

Thanks

Von: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
Im Auftrag von Rottmann, Jonas (centron GmbH)

Gesendet: Montag, 30. Dezember 2013 09:30

An: 'Andrei Mikhailovsky'

Cc: ceph-users@xxxxxxxx

Betreff: Re: [ceph-users] One OSD always dieing

Hi Andrei,

It is the first time I’m running into this. How to fix it? Upgrading with an not fully healthy cluster seams to be not so an great idea.

After fixing it I will perform the upgrad ASAP.

Thanks for your help so far.

Von: Andrei Mikhailovsky [mailto:andrei@xxxxxxxxxx]

Gesendet: Sonntag, 29. Dezember 2013 09:40

An: Rottmann, Jonas (centron GmbH)

Cc: ceph-users@xxxxxxxx

Betreff: Re: [ceph-users] One OSD always dieing

Jonas,

I've seen this happening on a weekly basis when I was running 0.61 branch as well, however after switching to 0.67 branch it has stopped. Perhaps you should try upgrading

Andrei

From:
"Jonas Rottmann (centron GmbH)" <J.Rottmann@xxxxxxxxxx>

To: "ceph-users@xxxxxxxx" <ceph-users@xxxxxxxx>

Sent: Saturday, 28 December, 2013 9:48:12 AM

Subject: [ceph-users] One OSD always dieing
Hi,

One of my OSDs are dieing all the time.  I rebooted one after one every node and assured that all has the same kernel version and glibc.

I’m using ceph version 0.61.9 (7440dcd135750839fa0f00263f80722ff6f51e90).

Dmesg only shows:

[ 5745.366041] init: ceph-osd (ceph/3) main process (2510) killed by ABRT signal
[ 5745.366235] init: ceph-osd (ceph/3) main process ended, respawning
[ 5763.824298] init: ceph-osd (ceph/3) main process (2991) killed by SEGV signal

Basically every time this shows up in the logs:

2013-12-28 06:35:08.489431 7fc9eccd5700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7fc9eccd5700 time 2013-12-28 06:35:08.487862
osd/ReplicatedPG.cc: 1379: FAILED assert(0)

If you need more infos I will send them. Please help ! The whole cluster isn’t working proberbly because of this…

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com