Re: reftable: new ref storage format

Shawn Pearce <spearce@xxxxxxxxxxx> · Sun, 23 Jul 2017 16:03:43 -0700

On Sun, Jul 23, 2017 at 3:56 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>> On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>>> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>
>> * What would you think about being extravagant and making the
>> value_type a full byte? It would make the format a tiny bit easier to
>> work with, and would leave room for future enhancements (e.g.,
>> pseudorefs, peeled symrefs, support for the successors of SHA-1s)
>> without having to change the file format dramatically.
>
> I reran my 866k file with full byte value_type. It pushes up the
> average bytes per ref from 33 to 34, but the overall file size is
> still 28M (with 64 block size). I think its reasonable to expand this
> to the full byte as you suggest.

FYI, I went back on this in the v3 draft I posted on Jul 22 in
https://public-inbox.org/git/CAJo=hJvxWg2J-yRiCK3szux=eYM2ThjT0KWo-SFFOOc1RkxXzg@xxxxxxxxxxxxxx/

I expanded value_type from 2 bits to 3 bits, but kept it as a bit
field in a varint. I just couldn't justify the additional byte per ref
in these large files. The prefix compression works well enough that
many refs are still able to use only a single byte for the
suffix_length << 3 | value_type varint, keeping the average at 33
bytes per ref.

The reftable format uses values 0-3, leaving 4-7 available. I reserved
4 for an arbitrary payload like MERGE_HEAD type files.