Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le vendredi 04 avril 2014 à 20:57 -0500, Alex Elder a écrit :
> On 04/04/2014 08:16 PM, Olivier Bonvalet wrote:
> > Le mardi 25 mars 2014 à 09:39 +0100, Olivier Bonvalet a écrit :
> >> Hi,
> >>
> >> what can/should I do to help fix that problem ?
> >>
> >> for now, RBD kernel client hang on : 
> >>         Assertion failure in rbd_img_obj_callback() at line 2131:
> >>            rbd_assert(which >= img_request->next_completion);
> >>
> >> or on :
> >>         Assertion failure in rbd_img_obj_callback() at line 2127:
> >>             rbd_assert(img_request != NULL);
> >>
> >>
> >> I have both case at least once per week, on latest 3.13.5 kernels.
> >>
> >> It seems that the problem occurs only on more loaded servers (I have 4
> >> near same servers, and crash occurs on two of them. If I move the VM,
> >> crash follows...).
> >>
> >> Olivier
> >>
> >> --
> > 
> > Hi,
> > 
> > so. After some days without any problems, RBD crashed toonight :
> 
> Unfortunately this could be a symptom of the same sort of race.
> When a object request is removed from its image request's list
> the request count gets decremented.  To be honest, all of these
> assertions in rbd_img_obj_callback() are probably unsafe, at
> least until I get the patch that does proper reference counting
> implemented:
> 
>         rbd_assert(img_request != NULL);
>         rbd_assert(img_request->obj_request_count > 0);
>         rbd_assert(which != BAD_WHICH);
>         rbd_assert(which < img_request->obj_request_count);
> 
> Until then I think you can avoid this by commenting out those
> assertions.  I'm afraid there will remain a (smaller) window
> of opportunity for a problem to occur, but I believe commenting
> those out will help for now.
> 
> I'm very sorry you're hitting these.  I'll see if I can get
> a comprehensive fix this weekend.
> 
> 					-Alex

Thanks for your help, really.

By removing those asserts, can I throw any data corruption ?

Olivier

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux