Re: reftable [v4]: new ref storage format

Stefan Beller <sbeller@xxxxxxxxxx> · Mon, 31 Jul 2017 12:01:13 -0700

On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> 4th iteration of the reftable storage format.
>
> You can read a rendered version of this here:
> https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md
>
> Significant changes from v3:
> - Incorporated Michael Haggerty's update_index concept for reflog.
> - Explicitly documented log-only tables.

I have read through v4 and I was missing the rationale
to why this is a good idea, digging up the discussion for
v3 seems to indicate that reflogs and refs themselves have
different usage patterns such that different compaction patterns
are desired, hence we need to have different files for them.

> ### Ref block format
>
> A ref block is written as:
>
>     'r'
>     uint24( block_len )
>     ref_record+
>     uint32( restart_offset )+

As the block_size is encoded in uint24, (and so is block_len),
the restart offsets could be made uint24 as well, though that may
have alignment issues, such that reading may be slower.
But as the ref_records may produce unaligned 32 bit ints
already, I am not worried about that.

>     uint16( restart_count )

When looking for 16/32/64 bit hard coded ints I came across
this one once again. How much more expensive is reading
a varint? As the block_len points at the restart_count, we cannot
replace it with a varint easily, but we could use a byte-reversed
varint instead. If we do this step, all restart offsets could also be
(reverse) varints?