> -----Original Message----- > From: Igor Fedotov [mailto:ifedotov@xxxxxxxxxxxx] > Sent: Monday, June 20, 2016 5:03 AM > To: Sage Weil <sweil@xxxxxxxxxx>; Allen Samuels > <Allen.Samuels@xxxxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: encoding > > > On 17.06.2016 23:16, Sage Weil wrote: > > On Fri, 17 Jun 2016, Allen Samuels wrote: > >>> -----Original Message----- > >>> From: Sage Weil [mailto:sweil@xxxxxxxxxx] > >>> Sent: Friday, June 17, 2016 10:55 AM > >>> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > >>> Cc: ceph-devel@xxxxxxxxxxxxxxx > >>> Subject: RE: encoding > >>> > >>> On Fri, 17 Jun 2016, Allen Samuels wrote: > >>>> I'm just trying to work the encoding size. > >>>> > >>>> Igor has point out that unused could be replaced by a bitmap and > >>>> suggested that it could be quite small (4 or 8 bytes) -- though he > >>>> cites some particular examples about it being "small". > >>>> > >>>> Do we actually need a full bitmap? Might not a simple left-off > >>>> pointer be almost as good? > >>> He's right that it's only useful for a min_alloc_size'd blob, where > >>> we expect teh bitmap to be about 16 bits. We could have a > >>> left/right boundary that encodes into 8 bits (two 0..15 values) that > >>> would probably capture 70% of the performance benefit and save one > >>> byte. I'm inclined to go for a bitmap for now, though... > >>> > >>>> w.r.t. ref_map. The comment in the code suggests that it will > >>>> always be empty for a non-shared blob. If that's correct perhaps > >>>> it's not a big deal. > >>> That was the original thought, but it didn't end up being the case. > >>> We could write a non-trivial chunk of code to rebuild that info at > >>> runtime from the lextent map, but I'd put that pretty far down on the list > too. > >>> Instead, we can probably take advantage of the fact that most of the > >>> ref counts will be 1 or 0 and combine those ~2 bits into a more > >>> efficent record_t encoding. This should be sequenced after the > >>> first pass of varint encodings, though, which I'm partway through > stabilizing. > >>> Should have a PR by Monday. > >> With some deeper thinking, it seems to me that the > >> rebuild-ref-map-on-deserialize is pretty trivial. Isn't it just a > >> walk of the lextent map and for each entry a corresponding call to > >> add_ref for that referenced blob (assuming add_ref correctly combines > >> overlapping references)???? > > You're exactly right. > > > > We encode the "blob map" and append it to the encoded onode or put it > > in the bnode key. I think we just need to make 2 variations of encode > > (to elide the ref_map), and to make the onode decode case rebuild it > > as you say. > +1 > Moreover we can probably omit ref_map for onodes that lacks clones and > use lextent map instead. This way we eliminate the need to rebuild the > ref_map on onode load and reduce onode memory footprint. Even better. > > > > > sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html