Re: Throttle.cc: 194: FAILED assert(c >= 0) on snap rm or create in Ceph 0.87

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So, I reproduced this issue in the same cluster with a non-optimized
debug build and found the problem, which looks like it still affects
master.

The cause of the assertion failure is a signed integer overflow, as
with issue 9592 (https://tracker.ceph.com/issues/9592), only in this
case it's because of a type safety issue on the union in ceph_osd_op
(https://github.com/ceph/ceph/blob/v0.87/src/include/rados.h#L445-L473).

The notify op gets created in this code path
(https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L177),
which sets op.watch.ver to the last version of the object. In the case
of the problematic volume that triggered this issue for us, the
version is 3197560675. This goes into the union in ceph_osd_op, and
happens to occupy the same position as the extent length would.

When we get to calc_op_budget, the definition of CEPH_OSD_OP_NOTIFY
has the CEPH_OSD_OP_MODE_READ and CEPH_OSD_OP_TYPE_DATA bits set, so it ends
up hitting the code path that looks at the extent length
(https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2464).

I guess the solution to this is to add a more stringent check to
calc_up_budget to ensure that only operation types that use the extent
interpretation of the union at the time the operation is created end
up in the code path that checks extent length. It looks like calling
ceph_osd_op_uses_extent instead of ceph_osd_op_type_data would be the
simplest way to accomplish this, but as an outsider, I'd appreciate a
sanity check from you to confirm that I'm understanding this
correctly. I've confirmed that all of the op types in
ceph_osd_op_uses_extent have the CEPH_OSD_TYPE_DATA bit set, so this
seems like a pretty safe and sensible change to make, but if there's
anything more to it than that, please let me know, otherwise I can
submit a patch for this.

Cheers,
Simon





On Mon, Oct 29, 2018 at 6:34 PM Simon Ruggier <simon@xxxxxxxxxxx> wrote:
>
> To attempt to answer your questions, I just ran a for loop to list the
> snapshots on every volume, and that finishes successfully. There are a
> few other snapshots on volumes in the pool, but the usage pattern is
> generally that snapshots are created to provide a point-in-time copy
> of the volume. The snapshots are copied elsewhere and then once the
> final copy of the backup has been written, the snapshot is removed.
> So, there's about 6000 volumes, and not many snapshots, but they get
> created and deleted regularly.
>
> The pool seems like it's functioning in general. I can't say
> with confidence that there aren't any other unexplained errors or
> assertion failures happening with specific volumes or snapshots,
> similar to this one, but I'm not aware of any other cases as of now.
>
> On Mon, Oct 29, 2018 at 5:26 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> >
> > [ Re-send to make it through the vger filters; sorry! ]
> >
> > Hmm, yeah. The ticket and failure mode makes me wonder if something
> > has gotten so strange with this image that the notify's bufferlist
> > actually exceeded a reasonable size, but I don't really see a
> > mechanism for that.
> > What snapshots exist on the pool? Can you successfully examine it in
> > other ways with the rbd image manipulation tools?
> >
> > On Fri, Oct 26, 2018 at 3:48 PM Simon Ruggier <simon@xxxxxxxxxxx> wrote:
> > >
> > > First of all, thanks for your reply.
> > >
> > > Yeah, this is happening within the process executing the rbd command.
> > > Sorry I didn't include the backtrace in my original email, I
> > > completely forgot after putting together the rest of it.
> > >
> > > I set "debug objecter = 20" in the local ceph config file on the
> > > system I ran these commands on, then ran rbd snap create, snap ls, and
> > > snap rm, so you could look at debug output from any of those
> > > three. I saved the entire session, anonymized all names in the output,
> > > and compressed it. See attached. If you need any other information,
> > > let me know and I'll collect it when I'm able to.
> > > On Fri, Oct 26, 2018 at 5:19 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > > >
> > > > This is happening on the client side? Can you provide the full
> > > > backtrace and a log with "debug objecter = 20" turned on?
> > > >
> > > > On Sun, Oct 21, 2018 at 11:25 AM Simon Ruggier <simon@xxxxxxxxxxx> wrote:
> > > > >
> > > > > Hi, I'm writing about a problem I'm seeing in a Ceph 0.87 cluster
> > > > > where rbd snap create, rm, etc. are succeeding, but aborting with a
> > > > > non-zero return code because the notify call at the very end of the
> > > > > function (https://github.com/ceph/ceph/blob/v0.87/src/librbd/internal.cc#L468)
> > > > >  is hitting an assertion failure (Throttle.cc: 194: FAILED assert(c >=
> > > > > 0)).
> > > > >
> > > > > I did a bit of digging, and found that c is calculated in
> > > > > calc_op_budget (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2453-L2471),
> > > > > which is called in Objecter::_take_op_budget
> > > > > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L1597-L1608),
> > > > > but could hypothetically be called again in Objecter::_throttle_op
> > > > > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2473-L2491),
> > > > > if the first calculation returned 0. From diving into the rd.notify
> > > > > call in IoCtxImpl.notify
> > > > > (https://github.com/ceph/ceph/blob/v0.87/src/librados/IoCtxImpl.cc#L1117),
> > > > > I can see that the call adds an op of type CEPH_OSD_OP_NOTIFY
> > > > > (https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.h#L865),
> > > > > which is defined at
> > > > > https://github.com/ceph/ceph/blob/v0.87/src/include/rados.h#L185. From
> > > > > that, we know that it's the code path at
> > > > > https://github.com/ceph/ceph/blob/v0.87/src/osdc/Objecter.cc#L2463-L2464
> > > > > that will be taken while calculating the budget, but from there I
> > > > > can't tell where or why there would be extents set on a notify
> > > > > operation. I'm not familiar with the Ceph codebase, so that's the
> > > > > point where I figured I should ask for some advice about this from
> > > > > someone who actually understands this stuff.
> > > > >
> > > > > I also noticed the possibly related issue #9592
> > > > > (http://tracker.ceph.com/issues/9592), but I'm not totally sure if
> > > > > it's the same issue, it looks like a pretty different reproduction
> > > > > process.
> > > > >
> > > > > I'm not expecting any bugfixes for such an old version of Ceph, but
> > > > > I'd appreciate help just understanding what's different with this
> > > > > particular volume and how to clean it up by hand, and in the unlikely
> > > > > event that this is a problem in the current development version of
> > > > > Ceph, perhaps this can be considered a bug report.



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux