Re: encoding

Igor Fedotov <ifedotov@xxxxxxxxxxxx> · Mon, 20 Jun 2016 15:02:57 +0300

On 17.06.2016 23:16, Sage Weil wrote:
On Fri, 17 Jun 2016, Allen Samuels wrote:
-----Original Message-----
From: Sage Weil [mailto:sweil@xxxxxxxxxx]
Sent: Friday, June 17, 2016 10:55 AM
To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: encoding

On Fri, 17 Jun 2016, Allen Samuels wrote:
I'm just trying to work the encoding size.

Igor has point out that unused could be replaced by a bitmap and
suggested that it could be quite small (4 or 8 bytes) -- though he
cites some particular examples about it being "small".

Do we actually need a full bitmap? Might not a simple left-off pointer
be almost as good?
He's right that it's only useful for a min_alloc_size'd blob, where we expect
teh bitmap to be about 16 bits.  We could have a left/right boundary that
encodes into 8 bits (two 0..15 values) that would probably capture 70% of the
performance benefit and save one byte.  I'm inclined to go for a bitmap for
now, though...

w.r.t. ref_map. The comment in the code suggests that it will always
be empty for a non-shared blob. If that's correct perhaps it's not a
big deal.
That was the original thought, but it didn't end up being the case.  We could
write a non-trivial chunk of code to rebuild that info at runtime from the
lextent map, but I'd put that pretty far down on the list too.
Instead, we can probably take advantage of the fact that most of the ref
counts will be 1 or 0 and combine those ~2 bits into a more efficent record_t
encoding.  This should be sequenced after the first pass of varint encodings,
though, which I'm partway through stabilizing.
Should have a PR by Monday.
With some deeper thinking, it seems to me that the
rebuild-ref-map-on-deserialize is pretty trivial. Isn't it just a walk
of the lextent map and for each entry a corresponding call to add_ref
for that referenced blob (assuming add_ref correctly combines
overlapping references)????
You're exactly right.

We encode the "blob map" and append it to the encoded onode or put it in
the bnode key.  I think we just need to make 2 variations of encode (to
elide the ref_map), and to make the onode decode case rebuild it as you
say.
+1
Moreover we can probably omit ref_map for onodes that lacks clones and 
use lextent map instead. This way we eliminate the need to rebuild the 
ref_map on onode load and reduce onode memory footprint.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html