Re: reftable [v4]: new ref storage format

Shawn Pearce <spearce@xxxxxxxxxxx> · Thu, 3 Aug 2017 19:50:28 -0700

On Thu, Aug 3, 2017 at 3:48 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
> I've revised the blockless reftable proposal to address some feedback:

I've been thinking more about your blockless proposal.

I experimentally modified my reftable implementation to omit padding
between blocks, bringing it a tiny bit closer to your blockless
proposal. Unfortunately this slightly increased latency for lookups on
a 4k chunk size. I speculate this is because chunks are no longer
aligned with the filesystem page, and I'm forcing Mac OS X to give me
two pages out of the filesystem. Using padding to align to a 4k block
is slightly faster, and the average wasted per block is <=20 bytes,
too small to fit another ref.

The restart table and binary search within the 4k block is a
performance win. Disabling the restart table significantly increased
lookup latency.

tl;dr:  I think the block alignment and restart table are wins vs. the
multi-level index.

A suggested downside of my reftable design is the ref index at 4k
block size for 866k refs is 199 KiB, and must be paged in for binary
search to locate the correct block for any lookup. The pack idx for
the two main packs in this repository is 210 MiB. We think fairly
little of mmap'ing 210 MiB to perform binary search to find object
data. 199 KiB for ref data seems to be a bargain. An advantage of the
single level index is its only one page touched after the index is
loaded.

Hot reftable reads (5.6 usec) are faster than loose ref reads (6.5
usec). Once the ref index is loaded, reftable can read a ref more
quickly than the time required to open-read-close a loose ref.
Admittedly, a large index slows down a cold read.

tl;dr:  I just don't think the size of the index is a concern.

I really favor the reflog data in a different section from the ref
values themselves. Even for smaller transaction files, it improves
scan and lookup time by allowing readers who just care about the name
and SHA-1 value of a ref to not be paging in or skipping over log
record payloads. However, I also agree that the aggregates may benefit
from ref and log being separate files.

> * Add a way to mark a reflog entry "deleted" without having to rewrite
> everything. This is mostly meant to deal with `refs/stash`.

This is an interesting idea. Given how I implemented reftable in JGit,
just inserting a deletion record for the same (ref,update_index) tuple
would make it trivial to hide the prior entry.

> * Define an extension mechanism.
> * Define the SHA-1 → object mapping as an extension rather than as
> part of the main spec. My gut feeling is that it will never be
> implemented for git-core.

While the SHA-1 -> object mapping may never be implemented for
git-core, I'd still prefer to see it as an optional part of the file
specification, rather than an extension that is specified. IMHO the
extension stuff in DIRC has made it unnecessarily complicated, and
we've still revved that file through many revisions.

> * Revise how the SHA-1 → object mapping works:
>     * Merge the bottommost OBJ_INDEX node with the old OBJ nodes to
> form a new OBJ_LEAF node.
>     * Allow the branching factor of each node to be specified
> independently (to allow the node sizes to be matched more closely to
> the preferred read sizes).

I'm not sure objects warrant this kind of complexity. The obj support
in reftable is nearly identical to the ref support. I have a
significant amount of code that is common between them. Your approach
has objects different enough from refs that they need their own code,
increasing complexity in both the writer and reader.

> I currently lean towards the opinion that we should store pseudorefs
> (like `FETCH_HEAD`, `MERGE_HEAD` *outside of* reftables, except for
> `HEAD` (which behaves more like a normal reference, which is
> considered for reachability, and for which we want to retain reflogs),
> which we should store *in* reftables.

I'm on the fence, and don't really have a strong opinion about where
we store the pseudorefs. Happy to keep them in $GIT_DIR, happy to have
them supported inside a reftable.