> -----Original Message----- > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > Sent: Friday, June 17, 2016 10:55 AM > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: RE: encoding > > On Fri, 17 Jun 2016, Allen Samuels wrote: > > I'm just trying to work the encoding size. > > > > Igor has point out that unused could be replaced by a bitmap and > > suggested that it could be quite small (4 or 8 bytes) -- though he > > cites some particular examples about it being "small". > > > > Do we actually need a full bitmap? Might not a simple left-off pointer > > be almost as good? > > He's right that it's only useful for a min_alloc_size'd blob, where we expect > teh bitmap to be about 16 bits. We could have a left/right boundary that > encodes into 8 bits (two 0..15 values) that would probably capture 70% of the > performance benefit and save one byte. I'm inclined to go for a bitmap for > now, though... > > > w.r.t. ref_map. The comment in the code suggests that it will always > > be empty for a non-shared blob. If that's correct perhaps it's not a > > big deal. > > That was the original thought, but it didn't end up being the case. We could > write a non-trivial chunk of code to rebuild that info at runtime from the > lextent map, but I'd put that pretty far down on the list too. > Instead, we can probably take advantage of the fact that most of the ref > counts will be 1 or 0 and combine those ~2 bits into a more efficent record_t > encoding. This should be sequenced after the first pass of varint encodings, > though, which I'm partway through stabilizing. > Should have a PR by Monday. With some deeper thinking, it seems to me that the rebuild-ref-map-on-deserialize is pretty trivial. Isn't it just a walk of the lextent map and for each entry a corresponding call to add_ref for that referenced blob (assuming add_ref correctly combines overlapping references)???? > > sage > > > > > Allen Samuels > > SanDisk |a Western Digital brand > > 2880 Junction Avenue, Milpitas, CA 95134 > > T: +1 408 801 7030| M: +1 408 780 6416 > > allen.samuels@xxxxxxxxxxx > > > > > > > -----Original Message----- > > > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > > > Sent: Friday, June 17, 2016 10:03 AM > > > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > > > Cc: ceph-devel@xxxxxxxxxxxxxxx > > > Subject: Re: encoding > > > > > > On Fri, 17 Jun 2016, Allen Samuels wrote: > > > > I donʼt understand the ref_map and unused stuff. What is their > purpose? > > > > > > ref_map: > > > > > > We allocate blobs and write uncompressed data in. Later, we logically > > > overwrite part of that reference, such that our big 1MB allocation only > has > > > part of it referenced. When that happens we want to release the > > > unreferenced part back to the allocator for use by something else. > > > ref_map lets us do that, which some additional complexity that counts > > > references (from multiple clones). > > > > > > unused: > > > > > > We might allocate a full min_alloc_size but only write one block into it. > > > If we do another small write in an adjacent block, we want to write into > the > > > existing blob. In order to do that without with a WAL event, we need to > > > know that the block isn't currently referenced by anything. > > > ref_map isn't quite sufficient for this because we don't have a > complicated > > > commit/persist lifecycle sequence on update, and we need to make sure > the > > > *committed* state has no references before we can safely overwrite a > > > block. > > > > > > After pondering this a while I decided that would be very complex to > > > implement that, with marginal benefit, and in reality this mostly matters > for > > > newly-allocated but never-written blobs... hence unused. I decided we > > > don't care the case when you partially occlude part of a blob (but less that > > > min_alloc_size so it wasn't released back to the allocator) and > > > *then* also do a small overwrite such that the space could be reused. > > > > > > sage > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html