Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le mardi 25 mars 2014 à 19:00 -0500, Alex Elder a écrit :
> On 03/25/2014 06:04 PM, Olivier Bonvalet wrote:
> > Le mardi 25 mars 2014 à 17:46 -0500, Alex Elder a écrit :
> >> On 03/25/2014 05:17 PM, Olivier Bonvalet wrote:
> >>>
> >>> I now have this one very often (here 5 minutes after the host boot) :
> >>
> >> I am fairly sure this indicates a use-after-free scenario,
> >> likely caused by something getting deleted before every
> >> user was done with it.
> >>
> >> I believe Ilya is done for the night; I'm going to spend some
> >> time looking at this to try to determine the cause.  If you
> >> are willing I'd love to have you try whatever fix I come up
> >> with.  I'd rather find a fix than just collect more information,
> >> but I may need to get more, we'll see.
> >>
> >> Thank you for all your reports, they help a lot.
> >>
> >> 					-Alex
> > 
> > Ok. I can apply some patch to help debug that yes.
> > I will try to reproduce on a different host, without customer data.
> > 
> > But I think I will stop here for the night too.
> > 
> > Thanks for your time,
> > Olivier
> 
> Here's something that will provide a few more pieces of
> information.  If you're around and you're able to try it
> out it might confirm something had likely been destroyed.
> 
> I'll keep sending stuff as I come up with it (even though
> I realize you may not be around until morning).
> 
> 					-Alex
> 
> 
> Index: b/drivers/block/rbd.c
> ===================================================================
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -2132,6 +2132,35 @@ static void rbd_img_obj_callback(struct
>  	spin_lock_irq(&img_request->completion_lock);
>  	if (which > img_request->next_completion)
>  		goto out;
> +	if (which != img_request->next_completion) {
> +		printk("%s: bad image object request information:\n", __func__);
> +		printk("obj_request %p\n", obj_request);
> +		printk("    ->object_name <%s>\n", obj_request->object_name);
> +		printk("    ->offset %llu\n", obj_request->offset);
> +		printk("    ->length %llu\n", obj_request->length);
> +		printk("    ->type 0x%x\n", (u32)obj_request->type);
> +		printk("    ->flags 0x%lx\n", obj_request->flags);
> +		printk("    ->img_request %p\n", obj_request->img_request);
> +		printk("    ->which %u\n", obj_request->which);
> +		printk("    ->xferred %llu\n", obj_request->xferred);
> +		printk("    ->result %d\n", obj_request->result);
> +		printk("    ->kref %d\n", atomic_read(&obj_request->kref));
> +
> +		printk("img_request %p\n", img_request);
> +		printk("    ->snap 0x%016llx\n", img_request->snap_id);
> +		printk("    ->offset %llu\n", img_request->offset);
> +		printk("    ->length %llu\n", img_request->length);
> +		printk("    ->flags 0x%lx\n", img_request->flags);
> +		printk("    ->obj_request_count %u\n",
> +			img_request->obj_request_count);
> +		printk("    ->next_completion %u\n",
> +			img_request->next_completion);
> +		printk("    ->xferred %llu\n", img_request->xferred);
> +		printk("    ->result %d\n", img_request->result);
> +		printk("    ->obj_requests head %p\n",
> +			img_request->obj_requests.next);
> +		printk("    ->kref %d\n", atomic_read(&img_request->kref));
> +	}
>  	rbd_assert(which == img_request->next_completion);
> 
>  	for_each_obj_request_from(img_request, obj_request) {
> 
> 

Thanks for your patch.

This is an output of a crash case :

Mar 26 02:31:18 alg kernel: [  965.366895] rbd_img_obj_callback: bad image object request information:
Mar 26 02:31:18 alg kernel: [  965.366905] obj_request ffff880224bc9528
Mar 26 02:31:18 alg kernel: [  965.366909]     ->object_name <(null)>
Mar 26 02:31:18 alg kernel: [  965.366913]     ->offset 0
Mar 26 02:31:18 alg kernel: [  965.366917]     ->length 4096
Mar 26 02:31:18 alg kernel: [  965.366921]     ->type 0x1
Mar 26 02:31:18 alg kernel: [  965.366925]     ->flags 0x3
Mar 26 02:31:18 alg kernel: [  965.366929]     ->img_request           (null)
Mar 26 02:31:18 alg kernel: [  965.366933]     ->which 4294967295
Mar 26 02:31:18 alg kernel: [  965.366936]     ->xferred 4096
Mar 26 02:31:18 alg kernel: [  965.366940]     ->result 0
Mar 26 02:31:18 alg kernel: [  965.366943]     ->kref 0
Mar 26 02:31:18 alg kernel: [  965.366947] img_request ffff880222f4fb50
Mar 26 02:31:18 alg kernel: [  965.366950]     ->snap 0xfffffffffffffffe
Mar 26 02:31:18 alg kernel: [  965.366954]     ->offset 1417662464
Mar 26 02:31:18 alg kernel: [  965.366957]     ->length 16384
Mar 26 02:31:18 alg kernel: [  965.366960]     ->flags 0x0
Mar 26 02:31:18 alg kernel: [  965.366963]     ->obj_request_count 0
Mar 26 02:31:18 alg kernel: [  965.366966]     ->next_completion 2
Mar 26 02:31:18 alg kernel: [  965.366969]     ->xferred 16384
Mar 26 02:31:18 alg kernel: [  965.366973]     ->result 0
Mar 26 02:31:18 alg kernel: [  965.366976]     ->obj_requests head ffff880222f4fbb0
Mar 26 02:31:18 alg kernel: [  965.366980]     ->kref 0
Mar 26 02:31:18 alg kernel: [  965.366985] 
Mar 26 02:31:18 alg kernel: [  965.366985] Assertion failure in rbd_img_obj_callback() at line 2165:
Mar 26 02:31:18 alg kernel: [  965.366985] 
Mar 26 02:31:18 alg kernel: [  965.366985] 	rbd_assert(which == img_request->next_completion);
Mar 26 02:31:18 alg kernel: [  965.366985] 
Mar 26 02:31:18 alg kernel: [  965.367185] ------------[ cut here ]------------
Mar 26 02:31:18 alg kernel: [  965.367241] kernel BUG at drivers/block/rbd.c:2165!


I hope it can help.


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux