On 03/25/2014 08:50 PM, Olivier Bonvalet wrote: > Le mercredi 26 mars 2014 à 02:33 +0100, Olivier Bonvalet a écrit : >> Thanks for your patch. >> >> This is an output of a crash case : >> >> Mar 26 02:31:18 alg kernel: [ 965.366895] rbd_img_obj_callback: bad image object request information: >> Mar 26 02:31:18 alg kernel: [ 965.366905] obj_request ffff880224bc9528 >> Mar 26 02:31:18 alg kernel: [ 965.366909] ->object_name <(null)> >> Mar 26 02:31:18 alg kernel: [ 965.366913] ->offset 0 >> Mar 26 02:31:18 alg kernel: [ 965.366917] ->length 4096 >> Mar 26 02:31:18 alg kernel: [ 965.366921] ->type 0x1 >> Mar 26 02:31:18 alg kernel: [ 965.366925] ->flags 0x3 >> Mar 26 02:31:18 alg kernel: [ 965.366929] ->img_request (null) >> Mar 26 02:31:18 alg kernel: [ 965.366933] ->which 4294967295 >> Mar 26 02:31:18 alg kernel: [ 965.366936] ->xferred 4096 >> Mar 26 02:31:18 alg kernel: [ 965.366940] ->result 0 >> Mar 26 02:31:18 alg kernel: [ 965.366943] ->kref 0 >> Mar 26 02:31:18 alg kernel: [ 965.366947] img_request ffff880222f4fb50 >> Mar 26 02:31:18 alg kernel: [ 965.366950] ->snap 0xfffffffffffffffe >> Mar 26 02:31:18 alg kernel: [ 965.366954] ->offset 1417662464 >> Mar 26 02:31:18 alg kernel: [ 965.366957] ->length 16384 >> Mar 26 02:31:18 alg kernel: [ 965.366960] ->flags 0x0 >> Mar 26 02:31:18 alg kernel: [ 965.366963] ->obj_request_count 0 >> Mar 26 02:31:18 alg kernel: [ 965.366966] ->next_completion 2 >> Mar 26 02:31:18 alg kernel: [ 965.366969] ->xferred 16384 >> Mar 26 02:31:18 alg kernel: [ 965.366973] ->result 0 >> Mar 26 02:31:18 alg kernel: [ 965.366976] ->obj_requests head ffff880222f4fbb0 >> Mar 26 02:31:18 alg kernel: [ 965.366980] ->kref 0 >> Mar 26 02:31:18 alg kernel: [ 965.366985] >> Mar 26 02:31:18 alg kernel: [ 965.366985] Assertion failure in rbd_img_obj_callback() at line 2165: >> Mar 26 02:31:18 alg kernel: [ 965.366985] >> Mar 26 02:31:18 alg kernel: [ 965.366985] rbd_assert(which == img_request->next_completion); >> Mar 26 02:31:18 alg kernel: [ 965.366985] >> Mar 26 02:31:18 alg kernel: [ 965.367185] ------------[ cut here ]------------ >> Mar 26 02:31:18 alg kernel: [ 965.367241] kernel BUG at drivers/block/rbd.c:2165! >> >> >> I hope it can help. >> >> Thanks for sending these. > > and a second one, very similar : > > Mar 26 02:48:27 alg kernel: [ 681.167833] rbd_img_obj_callback: bad image object request information: > Mar 26 02:48:27 alg kernel: [ 681.167836] obj_request ffff88022e1e2828 > Mar 26 02:48:27 alg kernel: [ 681.167837] ->object_name <(null)> > Mar 26 02:48:27 alg kernel: [ 681.167838] ->offset 0 > Mar 26 02:48:27 alg kernel: [ 681.167839] ->length 4096 > Mar 26 02:48:27 alg kernel: [ 681.167840] ->type 0x1 > Mar 26 02:48:27 alg kernel: [ 681.167840] ->flags 0x3 > Mar 26 02:48:27 alg kernel: [ 681.167841] ->img_request (null) > Mar 26 02:48:27 alg kernel: [ 681.167842] ->which 4294967295 > Mar 26 02:48:27 alg kernel: [ 681.167843] ->xferred 4096 > Mar 26 02:48:27 alg kernel: [ 681.167844] ->result 0 > Mar 26 02:48:27 alg kernel: [ 681.167844] ->kref 0 This confirms the reference count of the object request has gone to zero. This object request has already been destroyed (yet we're handling a callback for it). > Mar 26 02:48:27 alg kernel: [ 681.167845] img_request ffff88021f555f10 > Mar 26 02:48:27 alg kernel: [ 681.167846] ->snap 0xfffffffffffffffe > Mar 26 02:48:27 alg kernel: [ 681.167847] ->offset 28072464384 > Mar 26 02:48:27 alg kernel: [ 681.167847] ->length 16384 > Mar 26 02:48:27 alg kernel: [ 681.167848] ->flags 0x0 > Mar 26 02:48:27 alg kernel: [ 681.167849] ->obj_request_count 0 > Mar 26 02:48:27 alg kernel: [ 681.167850] ->next_completion 2 > Mar 26 02:48:27 alg kernel: [ 681.167850] ->xferred 16384 > Mar 26 02:48:27 alg kernel: [ 681.167851] ->result 0 > Mar 26 02:48:27 alg kernel: [ 681.167852] ->obj_requests head ffff88021f555f70 The object request list is empty. > Mar 26 02:48:27 alg kernel: [ 681.167853] ->kref 0 This confirms the reference count of the image request has gone to zero. So not only has the object request already completed, the image request has as well. I'm almost done composing a very large e-mail with some detailed analysis. No answer quite yet, but I am certain that we're getting duplicate callbacks on the second object request of an image request that spans two objects. That should help narrow the search for the root cause. -Alex > Mar 26 02:48:27 alg kernel: [ 681.167854] > Mar 26 02:48:27 alg kernel: [ 681.167854] Assertion failure in rbd_img_obj_callback() at line 2165: > Mar 26 02:48:27 alg kernel: [ 681.167854] > Mar 26 02:48:27 alg kernel: [ 681.167854] rbd_assert(which == img_request->next_completion); > Mar 26 02:48:27 alg kernel: [ 681.167854] > Mar 26 02:48:27 alg kernel: [ 681.168117] ------------[ cut here ]------------ > Mar 26 02:48:27 alg kernel: [ 681.168211] kernel BUG at drivers/block/rbd.c:2165! > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html