This is happening on the client side? Can you provide the full backtrace and a log with "debug objecter = 20" turned on? On Sun, Oct 21, 2018 at 11:25 AM Simon Ruggier <simon@xxxxxxxxxxx> wrote: > > Hi, I'm writing about a problem I'm seeing in a Ceph 0.87 cluster > where rbd snap create, rm, etc. are succeeding, but aborting with a > non-zero return code because the notify call at the very end of the > function (https://github.com/ceph/ceph/blob/v0.87/src/librbd/internal.cc#L468) > is hitting an assertion failure (Throttle.cc: 194: FAILED assert(c >= > 0)). > > I did a bit of digging, and found that c is calculated in > calc_op_budget (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2453-L2471), > which is called in Objecter::_take_op_budget > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L1597-L1608), > but could hypothetically be called again in Objecter::_throttle_op > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2473-L2491), > if the first calculation returned 0. From diving into the rd.notify > call in IoCtxImpl.notify > (https://github.com/ceph/ceph/blob/v0.87/src/librados/IoCtxImpl.cc#L1117), > I can see that the call adds an op of type CEPH_OSD_OP_NOTIFY > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L865), > which is defined at > https://github.com/ceph/ceph/blob/v0.87/src/include/rados.h#L185. From > that, we know that it's the code path at > https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2463-L2464 > that will be taken while calculating the budget, but from there I > can't tell where or why there would be extents set on a notify > operation. I'm not familiar with the Ceph codebase, so that's the > point where I figured I should ask for some advice about this from > someone who actually understands this stuff. > > I also noticed the possibly related issue #9592 > (http://tracker.ceph.com/issues/9592), but I'm not totally sure if > it's the same issue, it looks like a pretty different reproduction > process. > > I'm not expecting any bugfixes for such an old version of Ceph, but > I'd appreciate help just understanding what's different with this > particular volume and how to clean it up by hand, and in the unlikely > event that this is a problem in the current development version of > Ceph, perhaps this can be considered a bug report.