Re: Fwd: question about snapset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 3 Feb 2017, sheng qiu wrote:
> ---------- Forwarded message ----------
> From: sheng qiu <herbert1984106@xxxxxxxxx>
> Date: Fri, Feb 3, 2017 at 7:45 AM
> Subject: Re: question about snapset
> To: Sage Weil <sage@xxxxxxxxxxxx>
> Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
> 
> 
> Hi Sage,
> 
> Thanks a lot for your reply. It's very helpful.
> We are trying to avoid the query for snapset object for new object
> write, we think it may save some latency. Currently when we measure
> the op prepare latency, it's around 300 us, which is quite high. Is
> this normal? In our test, we configure 64 pgs per osd and use five
> shards with two workers per shard. The test machine is a four socket
> with 40 cores and plenty of memory.
> We are thinking how to reduce this latency, do you have any suggestions?

There are several sources, and this is an active area of 
investigation and optimization.  I'm not sure that the snapset 
specifically is probably a big part of the problem.. it's like the overall 
work involved with get_object_context(), which will fetch the 
attributes from the object.  The snapset will be a small part of this.

I suggest joining the weekly performance call if you can.  Or we can 
discuss some of the specific efforts on the list.  The main efforts here 
are

- simplifying ms_fast_dispatch so that incoming messages get queued more 
quickly
- making the new BlueStore (ObjectStore implementation) faster
- a big planned refactor for most of the do_op work that happens in 
between.

That's pretty vague but it's going to be a big project.  Right now we're 
trying to remove as much legacy complexity first to make our lives a 
bit easier...

sage


 > 
> Thanks
> Sheng
> 
> 
> On Feb 3, 2017 6:05 AM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote:
> >
> > On Thu, 2 Feb 2017, sheng qiu wrote:
> > > Hi cephers,
> > >
> > > We are reading the codes of Ceph I/O path. We found within the
> > > do_op(), it tries to get the object context each time as well as the
> > > snapset context.
> > >
> > > may i ask what's the usage of the snapset context and what's the
> > > relationship between snapset context and the individual object? Does
> > > each object has a snapset context ? Does it stores as attribute within
> > > the onode that associated with the object?
> > > i read some articles said only head object has snapset context, may i
> > > ask what's the head object and what's the relationship between a head
> > > object and other non-head objects?
> >
> > A given logical object may be contained by several snapshots, and we'll
> > have a separate clone for each unique version of the object. The
> > head (snapid == CEPH_NOSNAP) is the latest read/write version of the
> > object.  A clone (snapid < CEPH_NOSNAP) is the version for some number of
> > snapshots.  For example, if you wrote X, took snapshot 1, write X', wrote
> > X'', took snap 2, took snap 3, then wrote X''', you'd have 3 clones with
> > something like
> >
> > X    (snapid 1) snaps=[1]
> > X''  (snapid 3) snaps=[2,3]
> > X''' (head) clones=[1,2]
> >
> > The SnapSet is attached to the head and tells us we have 2 clones (1 and
> > 3).  Each clone has a snaps vector that tells us which snaps it exists in.
> > There's some other bookkeeping in SnapSet as well that tells us what data
> > extents are identical across adjacent clones (so that they can share
> > blocks on disk efficiently).
> >
> > There's one annoying oddity that if the head doesn't logically exist we
> > create an object with snapid CEPH_SNAPDIR and attach the SnapSet to
> > that.  We hope to remove this soon (by storing SnapSet on head and marking
> > head as a whiteout) as it complicates the code.
> >
> > Hope that helps!
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux