Re: Throttle.cc: 194: FAILED assert(c >= 0) on snap rm or create in Ceph 0.87

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First of all, thanks for your reply.

Yeah, this is happening within the process executing the rbd command.
Sorry I didn't include the backtrace in my original email, I
completely forgot after putting together the rest of it.

I set "debug objecter = 20" in the local ceph config file on the
system I ran these commands on, then ran rbd snap create, snap ls, and
snap rm, so you could look at debug output from any of those
three. I saved the entire session, anonymized all names in the output,
and compressed it. See attached. If you need any other information,
let me know and I'll collect it when I'm able to.
On Fri, Oct 26, 2018 at 5:19 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> This is happening on the client side? Can you provide the full
> backtrace and a log with "debug objecter = 20" turned on?
>
> On Sun, Oct 21, 2018 at 11:25 AM Simon Ruggier <simon@xxxxxxxxxxx> wrote:
> >
> > Hi, I'm writing about a problem I'm seeing in a Ceph 0.87 cluster
> > where rbd snap create, rm, etc. are succeeding, but aborting with a
> > non-zero return code because the notify call at the very end of the
> > function (https://github.com/ceph/ceph/blob/v0.87/src/librbd/internal.cc#L468)
> >  is hitting an assertion failure (Throttle.cc: 194: FAILED assert(c >=
> > 0)).
> >
> > I did a bit of digging, and found that c is calculated in
> > calc_op_budget (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2453-L2471),
> > which is called in Objecter::_take_op_budget
> > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L1597-L1608),
> > but could hypothetically be called again in Objecter::_throttle_op
> > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2473-L2491),
> > if the first calculation returned 0. From diving into the rd.notify
> > call in IoCtxImpl.notify
> > (https://github.com/ceph/ceph/blob/v0.87/src/librados/IoCtxImpl.cc#L1117),
> > I can see that the call adds an op of type CEPH_OSD_OP_NOTIFY
> > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L865),
> > which is defined at
> > https://github.com/ceph/ceph/blob/v0.87/src/include/rados.h#L185. From
> > that, we know that it's the code path at
> > https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2463-L2464
> > that will be taken while calculating the budget, but from there I
> > can't tell where or why there would be extents set on a notify
> > operation. I'm not familiar with the Ceph codebase, so that's the
> > point where I figured I should ask for some advice about this from
> > someone who actually understands this stuff.
> >
> > I also noticed the possibly related issue #9592
> > (http://tracker.ceph.com/issues/9592), but I'm not totally sure if
> > it's the same issue, it looks like a pretty different reproduction
> > process.
> >
> > I'm not expecting any bugfixes for such an old version of Ceph, but
> > I'd appreciate help just understanding what's different with this
> > particular volume and how to clean it up by hand, and in the unlikely
> > event that this is a problem in the current development version of
> > Ceph, perhaps this can be considered a bug report.

Attachment: debug-objecter-20-log-anonymized.bz2
Description: application/bzip


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux