Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le mardi 25 mars 2014 à 20:55 -0500, Alex Elder a écrit :
> On 03/25/2014 08:50 PM, Olivier Bonvalet wrote:
> > Le mercredi 26 mars 2014 à 02:33 +0100, Olivier Bonvalet a écrit :
> >> Thanks for your patch.
> >>
> >> This is an output of a crash case :
> >>
> >> Mar 26 02:31:18 alg kernel: [  965.366895] rbd_img_obj_callback: bad image object request information:
> >> Mar 26 02:31:18 alg kernel: [  965.366905] obj_request ffff880224bc9528
> >> Mar 26 02:31:18 alg kernel: [  965.366909]     ->object_name <(null)>
> >> Mar 26 02:31:18 alg kernel: [  965.366913]     ->offset 0
> >> Mar 26 02:31:18 alg kernel: [  965.366917]     ->length 4096
> >> Mar 26 02:31:18 alg kernel: [  965.366921]     ->type 0x1
> >> Mar 26 02:31:18 alg kernel: [  965.366925]     ->flags 0x3
> >> Mar 26 02:31:18 alg kernel: [  965.366929]     ->img_request           (null)
> >> Mar 26 02:31:18 alg kernel: [  965.366933]     ->which 4294967295
> >> Mar 26 02:31:18 alg kernel: [  965.366936]     ->xferred 4096
> >> Mar 26 02:31:18 alg kernel: [  965.366940]     ->result 0
> >> Mar 26 02:31:18 alg kernel: [  965.366943]     ->kref 0
> >> Mar 26 02:31:18 alg kernel: [  965.366947] img_request ffff880222f4fb50
> >> Mar 26 02:31:18 alg kernel: [  965.366950]     ->snap 0xfffffffffffffffe
> >> Mar 26 02:31:18 alg kernel: [  965.366954]     ->offset 1417662464
> >> Mar 26 02:31:18 alg kernel: [  965.366957]     ->length 16384
> >> Mar 26 02:31:18 alg kernel: [  965.366960]     ->flags 0x0
> >> Mar 26 02:31:18 alg kernel: [  965.366963]     ->obj_request_count 0
> >> Mar 26 02:31:18 alg kernel: [  965.366966]     ->next_completion 2
> >> Mar 26 02:31:18 alg kernel: [  965.366969]     ->xferred 16384
> >> Mar 26 02:31:18 alg kernel: [  965.366973]     ->result 0
> >> Mar 26 02:31:18 alg kernel: [  965.366976]     ->obj_requests head ffff880222f4fbb0
> >> Mar 26 02:31:18 alg kernel: [  965.366980]     ->kref 0
> >> Mar 26 02:31:18 alg kernel: [  965.366985] 
> >> Mar 26 02:31:18 alg kernel: [  965.366985] Assertion failure in rbd_img_obj_callback() at line 2165:
> >> Mar 26 02:31:18 alg kernel: [  965.366985] 
> >> Mar 26 02:31:18 alg kernel: [  965.366985] 	rbd_assert(which == img_request->next_completion);
> >> Mar 26 02:31:18 alg kernel: [  965.366985] 
> >> Mar 26 02:31:18 alg kernel: [  965.367185] ------------[ cut here ]------------
> >> Mar 26 02:31:18 alg kernel: [  965.367241] kernel BUG at drivers/block/rbd.c:2165!
> >>
> >>
> >> I hope it can help.
> >>
> >>
> 
> 
> Thanks for sending these.
> 
> > 
> > and a second one, very similar :
> > 
> > Mar 26 02:48:27 alg kernel: [  681.167833] rbd_img_obj_callback: bad image object request information:
> > Mar 26 02:48:27 alg kernel: [  681.167836] obj_request ffff88022e1e2828
> > Mar 26 02:48:27 alg kernel: [  681.167837]     ->object_name <(null)>
> > Mar 26 02:48:27 alg kernel: [  681.167838]     ->offset 0
> > Mar 26 02:48:27 alg kernel: [  681.167839]     ->length 4096
> > Mar 26 02:48:27 alg kernel: [  681.167840]     ->type 0x1
> > Mar 26 02:48:27 alg kernel: [  681.167840]     ->flags 0x3
> > Mar 26 02:48:27 alg kernel: [  681.167841]     ->img_request           (null)
> > Mar 26 02:48:27 alg kernel: [  681.167842]     ->which 4294967295
> > Mar 26 02:48:27 alg kernel: [  681.167843]     ->xferred 4096
> > Mar 26 02:48:27 alg kernel: [  681.167844]     ->result 0
> > Mar 26 02:48:27 alg kernel: [  681.167844]     ->kref 0
> 
> This confirms the reference count of the object request has gone
> to zero.  This object request has already been destroyed (yet
> we're handling a callback for it).
> 
> > Mar 26 02:48:27 alg kernel: [  681.167845] img_request ffff88021f555f10
> > Mar 26 02:48:27 alg kernel: [  681.167846]     ->snap 0xfffffffffffffffe
> > Mar 26 02:48:27 alg kernel: [  681.167847]     ->offset 28072464384
> > Mar 26 02:48:27 alg kernel: [  681.167847]     ->length 16384
> > Mar 26 02:48:27 alg kernel: [  681.167848]     ->flags 0x0
> > Mar 26 02:48:27 alg kernel: [  681.167849]     ->obj_request_count 0
> > Mar 26 02:48:27 alg kernel: [  681.167850]     ->next_completion 2
> > Mar 26 02:48:27 alg kernel: [  681.167850]     ->xferred 16384
> > Mar 26 02:48:27 alg kernel: [  681.167851]     ->result 0
> > Mar 26 02:48:27 alg kernel: [  681.167852]     ->obj_requests head ffff88021f555f70
> 
> The object request list is empty.
> 
> > Mar 26 02:48:27 alg kernel: [  681.167853]     ->kref 0
> 
> This confirms the reference count of the image request has gone
> to zero.  So not only has the object request already completed,
> the image request has as well.
> 
> I'm almost done composing a very large e-mail with some detailed
> analysis.  No answer quite yet, but I am certain that we're
> getting duplicate callbacks on the second object request of
> an image request that spans two objects.  That should help
> narrow the search for the root cause.
> 
> 					-Alex

Thanks again to took time to analyze that problem.

All my RBD images have daily snapshots, can this bug be related to
snapshots ?

Maybe it's a stupid question, but is there a workaround that I could use
to reduce that problem in production, until a proper fix is found ?


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux