On 03/25/2014 06:04 PM, Olivier Bonvalet wrote: > Le mardi 25 mars 2014 à 17:46 -0500, Alex Elder a écrit : >> On 03/25/2014 05:17 PM, Olivier Bonvalet wrote: >>> >>> I now have this one very often (here 5 minutes after the host boot) : >> >> I am fairly sure this indicates a use-after-free scenario, >> likely caused by something getting deleted before every >> user was done with it. >> >> I believe Ilya is done for the night; I'm going to spend some >> time looking at this to try to determine the cause. If you >> are willing I'd love to have you try whatever fix I come up >> with. I'd rather find a fix than just collect more information, >> but I may need to get more, we'll see. >> >> Thank you for all your reports, they help a lot. >> >> -Alex > > Ok. I can apply some patch to help debug that yes. > I will try to reproduce on a different host, without customer data. > > But I think I will stop here for the night too. > > Thanks for your time, > Olivier Here's something that will provide a few more pieces of information. If you're around and you're able to try it out it might confirm something had likely been destroyed. I'll keep sending stuff as I come up with it (even though I realize you may not be around until morning). -Alex Index: b/drivers/block/rbd.c =================================================================== --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -2132,6 +2132,35 @@ static void rbd_img_obj_callback(struct spin_lock_irq(&img_request->completion_lock); if (which > img_request->next_completion) goto out; + if (which != img_request->next_completion) { + printk("%s: bad image object request information:\n", __func__); + printk("obj_request %p\n", obj_request); + printk(" ->object_name <%s>\n", obj_request->object_name); + printk(" ->offset %llu\n", obj_request->offset); + printk(" ->length %llu\n", obj_request->length); + printk(" ->type 0x%x\n", (u32)obj_request->type); + printk(" ->flags 0x%lx\n", obj_request->flags); + printk(" ->img_request %p\n", obj_request->img_request); + printk(" ->which %u\n", obj_request->which); + printk(" ->xferred %llu\n", obj_request->xferred); + printk(" ->result %d\n", obj_request->result); + printk(" ->kref %d\n", atomic_read(&obj_request->kref)); + + printk("img_request %p\n", img_request); + printk(" ->snap 0x%016llx\n", img_request->snap_id); + printk(" ->offset %llu\n", img_request->offset); + printk(" ->length %llu\n", img_request->length); + printk(" ->flags 0x%lx\n", img_request->flags); + printk(" ->obj_request_count %u\n", + img_request->obj_request_count); + printk(" ->next_completion %u\n", + img_request->next_completion); + printk(" ->xferred %llu\n", img_request->xferred); + printk(" ->result %d\n", img_request->result); + printk(" ->obj_requests head %p\n", + img_request->obj_requests.next); + printk(" ->kref %d\n", atomic_read(&img_request->kref)); + } rbd_assert(which == img_request->next_completion); for_each_obj_request_from(img_request, obj_request) { -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html