On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > 4th iteration of the reftable storage format. > > You can read a rendered version of this here: > https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md > > Significant changes from v3: > - Incorporated Michael Haggerty's update_index concept for reflog. > - Explicitly documented log-only tables. I have read through v4 and I was missing the rationale to why this is a good idea, digging up the discussion for v3 seems to indicate that reflogs and refs themselves have different usage patterns such that different compaction patterns are desired, hence we need to have different files for them. > ### Ref block format > > A ref block is written as: > > 'r' > uint24( block_len ) > ref_record+ > uint32( restart_offset )+ As the block_size is encoded in uint24, (and so is block_len), the restart offsets could be made uint24 as well, though that may have alignment issues, such that reading may be slower. But as the ref_records may produce unaligned 32 bit ints already, I am not worried about that. > uint16( restart_count ) When looking for 16/32/64 bit hard coded ints I came across this one once again. How much more expensive is reading a varint? As the block_len points at the restart_count, we cannot replace it with a varint easily, but we could use a byte-reversed varint instead. If we do this step, all restart offsets could also be (reverse) varints?