RE: encoding

Allen Samuels <Allen.Samuels@xxxxxxxxxxx> · Mon, 20 Jun 2016 15:56:55 +0000

> -----Original Message-----
> From: Igor Fedotov [mailto:ifedotov@xxxxxxxxxxxx]
> Sent: Monday, June 20, 2016 5:03 AM
> To: Sage Weil <sweil@xxxxxxxxxx>; Allen Samuels
> <Allen.Samuels@xxxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: encoding
> 
> 
> On 17.06.2016 23:16, Sage Weil wrote:
> > On Fri, 17 Jun 2016, Allen Samuels wrote:
> >>> -----Original Message-----
> >>> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> >>> Sent: Friday, June 17, 2016 10:55 AM
> >>> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> >>> Cc: ceph-devel@xxxxxxxxxxxxxxx
> >>> Subject: RE: encoding
> >>>
> >>> On Fri, 17 Jun 2016, Allen Samuels wrote:
> >>>> I'm just trying to work the encoding size.
> >>>>
> >>>> Igor has point out that unused could be replaced by a bitmap and
> >>>> suggested that it could be quite small (4 or 8 bytes) -- though he
> >>>> cites some particular examples about it being "small".
> >>>>
> >>>> Do we actually need a full bitmap? Might not a simple left-off
> >>>> pointer be almost as good?
> >>> He's right that it's only useful for a min_alloc_size'd blob, where
> >>> we expect teh bitmap to be about 16 bits.  We could have a
> >>> left/right boundary that encodes into 8 bits (two 0..15 values) that
> >>> would probably capture 70% of the performance benefit and save one
> >>> byte.  I'm inclined to go for a bitmap for now, though...
> >>>
> >>>> w.r.t. ref_map. The comment in the code suggests that it will
> >>>> always be empty for a non-shared blob. If that's correct perhaps
> >>>> it's not a big deal.
> >>> That was the original thought, but it didn't end up being the case.
> >>> We could write a non-trivial chunk of code to rebuild that info at
> >>> runtime from the lextent map, but I'd put that pretty far down on the list
> too.
> >>> Instead, we can probably take advantage of the fact that most of the
> >>> ref counts will be 1 or 0 and combine those ~2 bits into a more
> >>> efficent record_t encoding.  This should be sequenced after the
> >>> first pass of varint encodings, though, which I'm partway through
> stabilizing.
> >>> Should have a PR by Monday.
> >> With some deeper thinking, it seems to me that the
> >> rebuild-ref-map-on-deserialize is pretty trivial. Isn't it just a
> >> walk of the lextent map and for each entry a corresponding call to
> >> add_ref for that referenced blob (assuming add_ref correctly combines
> >> overlapping references)????
> > You're exactly right.
> >
> > We encode the "blob map" and append it to the encoded onode or put it
> > in the bnode key.  I think we just need to make 2 variations of encode
> > (to elide the ref_map), and to make the onode decode case rebuild it
> > as you say.
> +1
> Moreover we can probably omit ref_map for onodes that lacks clones and
> use lextent map instead. This way we eliminate the need to rebuild the
> ref_map on onode load and reduce onode memory footprint.

Even better.

> 
> >
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html