Re: Assertion in v0.40 - os/FileStore.cc: 2438: FAILED assert(0 == "unexpected error")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 15 Jan 2012, Martin Mailand wrote:
> Hi Sage,
> that's exactly what I did, the first two crashes are in this log,
> unfortunately there was no debug level set.

Whoops, right.. the (old) replay messages above confused me.  

There are a couple possibilities here.  One is that the recovery code went 
in the wrong order.  I'm a bit skeptical, though, and even if it did, this 
was mostly just rewritten in wip-backfill for 0.41, so I don't think it's 
worth debugging.  We have some tools to hammer on then snapshot + 
recovery code, but they aren't in the regular qa rotation yet.

More likely is that the SnapSet notion of clone_overlap got out of sync 
with the actual clones.  To check that, we need a dump of the xattrs on 
the _head object.  File sizes and attrs for the clones would help too.

Are the _2 clone objects on other replicas 4MB or 23 bytes?

Is this keeping your cluster down?  If so, you can work around the problem 
by making the _2 clone object 4MB so that reply will succeed.  Just be
aware that that rbd image's content will be corrupted.  :/

sage


> 
> http://85.214.49.87/ceph/osd.0.full.log.bz2
> 
> -martin
> 
> 
> 
> Am 15.01.2012 03:45, schrieb Sage Weil:
> > Hi Martin-
> > 
> > On Sat, 14 Jan 2012, Martin Mailand wrote:
> > 
> > > Hi
> > > one of four OSD died during the update to v0.40 with an Assertion
> > > os/FileStore.cc: 2438: FAILED assert(0 == "unexpected error")
> > > Even after a complete shutdown of the cluster an a new start with all OSD
> > > at
> > > the same version, this osd did not start.
> > > 
> > > The OSD Log it attached.
> > 
> > It's trying to replay a transaction that appears to be invalid because the
> > .2 clone is smaller than it thinks.  Is this the first time the OSD
> > crashed, or did it crash once, and you cranked up logs and generated
> > this one?  If you have the previous log, that would be helpful... it
> > should have a similar tranasction dump but a different stack trace.
> > 
> > Also, are any of the 6 patches on top of 0.40 related to the filestore or
> > osd?
> > 
> > Thanks!
> > sage
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux