I'm just trying to work the encoding size. Igor has point out that unused could be replaced by a bitmap and suggested that it could be quite small (4 or 8 bytes) -- though he cites some particular examples about it being "small". Do we actually need a full bitmap? Might not a simple left-off pointer be almost as good? w.r.t. ref_map. The comment in the code suggests that it will always be empty for a non-shared blob. If that's correct perhaps it's not a big deal. Allen Samuels SanDisk |a Western Digital brand 2880 Junction Avenue, Milpitas, CA 95134 T: +1 408 801 7030| M: +1 408 780 6416 allen.samuels@xxxxxxxxxxx > -----Original Message----- > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > Sent: Friday, June 17, 2016 10:03 AM > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: encoding > > On Fri, 17 Jun 2016, Allen Samuels wrote: > > I donʼt understand the ref_map and unused stuff. What is their purpose? > > ref_map: > > We allocate blobs and write uncompressed data in. Later, we logically > overwrite part of that reference, such that our big 1MB allocation only has > part of it referenced. When that happens we want to release the > unreferenced part back to the allocator for use by something else. > ref_map lets us do that, which some additional complexity that counts > references (from multiple clones). > > unused: > > We might allocate a full min_alloc_size but only write one block into it. > If we do another small write in an adjacent block, we want to write into the > existing blob. In order to do that without with a WAL event, we need to > know that the block isn't currently referenced by anything. > ref_map isn't quite sufficient for this because we don't have a complicated > commit/persist lifecycle sequence on update, and we need to make sure the > *committed* state has no references before we can safely overwrite a > block. > > After pondering this a while I decided that would be very complex to > implement that, with marginal benefit, and in reality this mostly matters for > newly-allocated but never-written blobs... hence unused. I decided we > don't care the case when you partially occlude part of a blob (but less that > min_alloc_size so it wasn't released back to the allocator) and > *then* also do a small overwrite such that the space could be reused. > > sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html