---------- Forwarded message ---------- From: sheng qiu <herbert1984106@xxxxxxxxx> Date: Fri, Feb 3, 2017 at 7:45 AM Subject: Re: question about snapset To: Sage Weil <sage@xxxxxxxxxxxx> Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx> Hi Sage, Thanks a lot for your reply. It's very helpful. We are trying to avoid the query for snapset object for new object write, we think it may save some latency. Currently when we measure the op prepare latency, it's around 300 us, which is quite high. Is this normal? In our test, we configure 64 pgs per osd and use five shards with two workers per shard. The test machine is a four socket with 40 cores and plenty of memory. We are thinking how to reduce this latency, do you have any suggestions? Thanks Sheng On Feb 3, 2017 6:05 AM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote: > > On Thu, 2 Feb 2017, sheng qiu wrote: > > Hi cephers, > > > > We are reading the codes of Ceph I/O path. We found within the > > do_op(), it tries to get the object context each time as well as the > > snapset context. > > > > may i ask what's the usage of the snapset context and what's the > > relationship between snapset context and the individual object? Does > > each object has a snapset context ? Does it stores as attribute within > > the onode that associated with the object? > > i read some articles said only head object has snapset context, may i > > ask what's the head object and what's the relationship between a head > > object and other non-head objects? > > A given logical object may be contained by several snapshots, and we'll > have a separate clone for each unique version of the object. The > head (snapid == CEPH_NOSNAP) is the latest read/write version of the > object. A clone (snapid < CEPH_NOSNAP) is the version for some number of > snapshots. For example, if you wrote X, took snapshot 1, write X', wrote > X'', took snap 2, took snap 3, then wrote X''', you'd have 3 clones with > something like > > X (snapid 1) snaps=[1] > X'' (snapid 3) snaps=[2,3] > X''' (head) clones=[1,2] > > The SnapSet is attached to the head and tells us we have 2 clones (1 and > 3). Each clone has a snaps vector that tells us which snaps it exists in. > There's some other bookkeeping in SnapSet as well that tells us what data > extents are identical across adjacent clones (so that they can share > blocks on disk efficiently). > > There's one annoying oddity that if the head doesn't logically exist we > create an object with snapid CEPH_SNAPDIR and attach the SnapSet to > that. We hope to remove this soon (by storing SnapSet on head and marking > head as a whiteout) as it complicates the code. > > Hope that helps! > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html