Re: Throttle.cc: 194: FAILED assert(c >= 0) on snap rm or create in Ceph 0.87

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is happening on the client side? Can you provide the full
backtrace and a log with "debug objecter = 20" turned on?

On Sun, Oct 21, 2018 at 11:25 AM Simon Ruggier <simon@xxxxxxxxxxx> wrote:
>
> Hi, I'm writing about a problem I'm seeing in a Ceph 0.87 cluster
> where rbd snap create, rm, etc. are succeeding, but aborting with a
> non-zero return code because the notify call at the very end of the
> function (https://github.com/ceph/ceph/blob/v0.87/src/librbd/internal.cc#L468)
>  is hitting an assertion failure (Throttle.cc: 194: FAILED assert(c >=
> 0)).
>
> I did a bit of digging, and found that c is calculated in
> calc_op_budget (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2453-L2471),
> which is called in Objecter::_take_op_budget
> (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L1597-L1608),
> but could hypothetically be called again in Objecter::_throttle_op
> (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2473-L2491),
> if the first calculation returned 0. From diving into the rd.notify
> call in IoCtxImpl.notify
> (https://github.com/ceph/ceph/blob/v0.87/src/librados/IoCtxImpl.cc#L1117),
> I can see that the call adds an op of type CEPH_OSD_OP_NOTIFY
> (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L865),
> which is defined at
> https://github.com/ceph/ceph/blob/v0.87/src/include/rados.h#L185. From
> that, we know that it's the code path at
> https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2463-L2464
> that will be taken while calculating the budget, but from there I
> can't tell where or why there would be extents set on a notify
> operation. I'm not familiar with the Ceph codebase, so that's the
> point where I figured I should ask for some advice about this from
> someone who actually understands this stuff.
>
> I also noticed the possibly related issue #9592
> (http://tracker.ceph.com/issues/9592), but I'm not totally sure if
> it's the same issue, it looks like a pretty different reproduction
> process.
>
> I'm not expecting any bugfixes for such an old version of Ceph, but
> I'd appreciate help just understanding what's different with this
> particular volume and how to clean it up by hand, and in the unlikely
> event that this is a problem in the current development version of
> Ceph, perhaps this can be considered a bug report.



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux