Based on some of Allen's comments I've updated my branch with (so far) three different encoders: 1) varint - general purpose small integers (lops off high and low zero bits) first byte: low 2 bits = how many low nibbles of zeros 5 bits = data 1 high bit = another byte follows subsequent bytes: 7 bits = data 1 high bit = another byte follows 2) delta varint first byte: 1 low bit = sign (0 = positive, 1 = negative) low 2 bits = how many low nibbles of zeros 4 bits = data 1 high bit = another byte follows subsequent bytes: 7 bits = data 1 high bit = another byte follows 3) raw lba: first 3 bytes: low 2 bits = how many low bits of zeros 00 = none 01 = 12 (4k alignment) 10 = 16 (64k alignment) 11 = 20 (256k alignment) 21 bits = data 1 high bit = another byte follows subsequent bytes: 7 bits = data 1 high bit = another byte follows 4) lba delta (distance between two lba's, e.g., when encoding a list of extents) first byte: 1 low bit = sign (0 = positive, 1 = negative) 2 bits = how many low bits of zeros 00 = none 01 = 12 (4k alignment) 10 = 16 (64k alignment) 11 = 20 (256k alignment) 4 bits = data 1 bit = another byte follows subsequent bytes: 7 bits = data 1 bit = another byte follows Notably on this one we have 4 bits of data *and* when we roll over to the next value you'll get 4 trailing 0's and we ask for one more nibble of trailing 0's... still in one encoded byte. I think this'll be a decent set of building blocks to encoding the existing structures efficiently (and still in a generic way) before getting specific with common patterns. https://github.com/ceph/ceph/pull/9728/files sage On Wed, 15 Jun 2016, Sage Weil wrote: > > If we have those, I'm not sure #1 will be worth it--the zeroed offset > fields will encode with one byte. > > > (3) re-jiggering of blob/extents when possible. Much of the two-level > > blob/extent map exists to support compression. When you're not > > compressed you can collapse this into a single blob and avoid the > > encoding overhead for it. > > Hmm, good idea. As long as the csum parameters match we can do this. The > existing function > > int bluestore_onode_t::compress_extent_map() > > currently just combines consecutive lextent's that point to contiguous > regions in the same blob. We could extend this to combine blobs that are > combinable. > > > There are other potential optimizations too that are artifacts of the > > current code. For example, we support different checksum > > algorithms/values on a per-blob basis. Clearly moving this to a > > per-oNode basis is acceptable and would simplify and shrink the encoding > > even more. > > The latest csum branch > > https://github.com/ceph/ceph/pull/9526 > > varies csum_order on a per-blob basis (for example, larger csum chunks for > compressed blobs and small csum chunks for uncompressed blobs with 4k > overwrites). The alg is probably consistent across the onode, but the > will uglify the code a bit to pass it into the blob_t csum methods. I'd > prefer to hold off on this. With the varint encoding above it'll only be > one byte per blob at least. > > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html