Re: RBD hard crash on kernel 3.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 13, 2015 at 10:18 PM, Shawn Edwards <lesser.evil@xxxxxxxxx> wrote:
> Here's a vmcore, along with log files from Xen's crash dump utility.
>
> https://drive.google.com/file/d/0Bz8b7ZiWX00AeHRhMjNvdVNLdDQ/view?usp=sharing
>
> Let me know if we can help more.
>
> On Fri, Apr 10, 2015 at 1:04 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>
>> On Fri, Apr 10, 2015 at 8:03 PM, Shawn Edwards <lesser.evil@xxxxxxxxx>
>> wrote:
>> > I took the rbd and ceph drivers out of the patched kernel above and
>> > merged
>> > them into Xen's kernel.  Works as well as the old one; still crashes.
>> > But
>> > now I get logs.  From the Xen logs:
>> >
>> > [   1128.217561]    ERR:
>> > Assertion failure in rbd_img_obj_callback() at line 2363:
>> >
>> > rbd_assert(more ^ (which == img_request->obj_request_count));
>>
>> Ah, that's a long standing bug which we know wasn't properly fixed -
>> a tight race in rbd completion callback.  It looks like it doesn't take
>> long for you to reproduce it.  Can you try grabbing a vmcore such that
>> it can be inspected with crash utility?

On a closer inspection, that looks like a simple error handling bug.  The out
of memory splat before the assert sets ->result to -ENOMEM and the logic in
rbd_img_obj_callback() just fails to handle it.  I'll fix it later this week.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux