On Tue, Mar 25, 2014 at 2:51 PM, Alex Elder <elder@xxxxxxxx> wrote: > On 03/25/2014 07:34 AM, Ilya Dryomov wrote: >>> On 03/25/2014 04:04 AM, Ilya Dryomov wrote: >>>> On Tue, Mar 25, 2014 at 10:39 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> what can/should I do to help fix that problem ? >>>>> >>>>> for now, RBD kernel client hang on : >>>>> Assertion failure in rbd_img_obj_callback() at line 2131: >>>>> rbd_assert(which >= img_request->next_completion); >>> >>> If you can build your own kernel as Ilya says I'd like to >>> see the values of which and img_request->next_completion >>> here. >> >> Looks like which was 1, which means that next_completion had to be 2 or >> greater. I miss solaris crash dumps ... >> >> On a different note, why are we asserting next_completion outside of >> a spinlock which is supposed to protect next_completion? > > That's a very good point (which could be easily remedied by moving > the assertion down a couple lines). The image object request (#1) > in this case will have been marked done at this point; it's possible > that request #2 (or later) was concurrently getting handled by the > for_each_obj_request_from() loop below in that same function, but > may not have updated next_completion yet. > > So that *could* explain the tripped assertion. The assertion > should be moved in any case, it's a bug. > > That being said, it doesn't explain the other assertion: > rbd_assert(img_request != NULL); > So there's at least one other thing going on. Yeah, exactly my thoughts. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html