Re: ceph segfault on all osd

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 10 Apr 2013 09:45:27 -0700

[Please keep all mail on the list.]  

Hmm, that OSD log doesn't show a crash. I thought you said they were all crashing? Do they come up okay when you turn them back on again?
-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com

On Wednesday, April 10, 2013 at 9:27 AM, Witalij Poljatchek wrote:

> the log files.
>  
> thank you ! :)
>  
> On 04/10/2013 06:06 PM, Gregory Farnum wrote:
> > [Re-adding the list.]
> >  
> > When the OSDs crash they will print out to their log a short description of what happened, with a bunch of function names.
> >  
> > Unfortunately the problem you've run into is probably non-trivial to solve as you've introduced a bit of a weird situation into the permanent record that your OSDs need to process. I've created a bug (http://tracker.ceph.com/issues/4699), you can follow that. :)
> > -Greg
> > Software Engineer #42 @ http://inktank.com | http://ceph.com
> >  
> >  
> > On Wednesday, April 10, 2013 at 8:57 AM, Witalij Poljatchek wrote:
> >  
> > > there are no data.
> > >  
> > > Plain OSDs
> > >  
> > >  
> > > What you mean backtrace ? strace of ceph-osd process ?
> > >  
> > >  
> > > is easy to reproduce.
> > >  
> > > setup plain cluster
> > >  
> > > and then set:
> > >  
> > > ceph osd pool set rbd size 0
> > >  
> > > after minute set:
> > >  
> > > ceph osd pool set rbd size 2
> > >  
> > > that all.
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > On 04/10/2013 05:24 PM, Gregory Farnum wrote:
> > > > Sounds like they aren't handling the transition very well when trying to calculate old OSDs which might have held the PG. Are you trying to salvage the data that was in it, or can you throw it away?
> > > > Can you post the backtrace they're producing?
> > > > -Greg
> > > > Software Engineer #42 @ http://inktank.com | http://ceph.com
> > > >  
> > > >  
> > > > On Wednesday, April 10, 2013 at 3:59 AM, Witalij Poljatchek wrote:
> > > >  
> > > > > Hello,
> > > > >  
> > > > > need help to solve segfault on all osd in my test cluster.
> > > > >  
> > > > >  
> > > > > Setup ceph from scratch.
> > > > > service ceph -a start
> > > > >  
> > > > > ceph -w
> > > > > health HEALTH_OK
> > > > > monmap e1: 3 mons at {1=10.200.20.1:6789/0,2=10.200.20.2:6789/0,3=10.200.20.3:6789/0}, election epoch 6, quorum 0,1,2 1,2,3
> > > > > osdmap e5: 4 osds: 4 up, 4 in
> > > > > pgmap v305: 960 pgs: 960 active+clean; 0 bytes data, 40147 MB used, 26667 GB / 26706 GB avail
> > > > > mdsmap e1: 0/0/1 up
> > > > >  
> > > > >  
> > > > > if i set replica size to 0 "i know this make no sense"
> > > > > ceph osd pool set rbd size 0
> > > > > and then back to 2
> > > > > ceph osd pool set rbd size 2
> > > > >  
> > > > > then i see that on all OSDs the process ceph-osd crash with segfault
> > > > >  
> > > > > If i stop MONs daemons then i can start OSDs but if i start MONs back then die all OSDs again.
> > > > >  
> > > > >  
> > > > >  
> > > > > How i cann repair this behavior ?
> > > > >  
> > > > >  
> > > > >  
> > > > >  
> > > > >  
> > > > > My setup
> > > > > Nothing specials:
> > > > >  
> > > > > Centos 6.3
> > > > >  
> > > > > Kernel: 3.8.3-1.el6.elrepo.x86_64
> > > > >  
> > > > > ceph-fuse-0.56.4-0.el6.x86_64
> > > > > ceph-test-0.56.4-0.el6.x86_64
> > > > > libcephfs1-0.56.4-0.el6.x86_64
> > > > > ceph-0.56.4-0.el6.x86_64
> > > > > ceph-release-1-0.el6.noarch
> > > > >  
> > > > > cat /etc/ceph/ceph.conf
> > > > >  
> > > > > [global]
> > > > > auth cluster required = none
> > > > > auth service required = none
> > > > > auth client required = none
> > > > > keyring = /etc/ceph/$name.keyring
> > > > > [mon]
> > > > > [mds]
> > > > > [osd]
> > > > > osd journal size = 10000
> > > > > [mon.1]
> > > > > host = ceph-mon1
> > > > > mon addr = 10.200.20.1:6789
> > > > > [mon.2]
> > > > > host = ceph-mon2
> > > > > mon addr = 10.200.20.2:6789
> > > > > [mon.3]
> > > > > host = ceph-mon3
> > > > > mon addr = 10.200.20.3:6789
> > > > >  
> > > > > [osd.0]
> > > > > host = ceph-osd1
> > > > > [osd.1]
> > > > > host = ceph-osd2
> > > > > [osd.2]
> > > > > host = ceph-osd3
> > > > > [osd.3]
> > > > > host = ceph-osd4
> > > > >  
> > > > > [mds.a]
> > > > > host = ceph-mds1
> > > > > [mds.b]
> > > > > host = ceph-mds2
> > > > > [mds.c]
> > > > > host = ceph-mds3
> > > > >  
> > > > > Thanks much.
> > > > > -- AIXIT GmbH - Witalij Poljatchek (T) +49 69 203 4709-13 - (F) +49 69 203 470 979 wp@xxxxxxxxx - http://www.aixit.com AIXIT GmbH Strahlenbergerstr. 14 63067 Offenbach am Main (T) +49 69 203 470 913 Amtsgericht Offenbach, HRB 43953 Geschäftsführer: Friedhelm Heyer, Holger Grauer
> > > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > --
> > > AIXIT GmbH - Witalij Poljatchek
> > > (T) +49 69 203 4709-13 - (F) +49 69 203 470 979
> > > wp@xxxxxxxxx - http://www.aixit.com
> > >  
> > > AIXIT GmbH
> > >  
> > > Strahlenbergerstr. 14
> > > 63067 Offenbach am Main
> > > (T) +49 69 203 470 913
> > >  
> > > Amtsgericht Offenbach, HRB 43953
> > > Geschäftsführer: Friedhelm Heyer, Holger Grauer
> >  
>  
>  
>  
> --  
> AIXIT GmbH - Witalij Poljatchek
> (T) +49 69 203 4709-13 - (F) +49 69 203 470 979
> wp@xxxxxxxxx - http://www.aixit.com
>  
> AIXIT GmbH
>  
> Strahlenbergerstr. 14
> 63067 Offenbach am Main
> (T) +49 69 203 470 913
>  
> Amtsgericht Offenbach, HRB 43953
> Geschäftsführer: Friedhelm Heyer, Holger Grauer
>  
>  
> Attachments:  
> - ceph-osd.0.log
>  
> - ceph-mon.1.log
>  
> - ceph.log
>  

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com