Re: reftable: new ref storage format

Jeff King <peff@xxxxxxxx> · Sun, 16 Jul 2017 06:01:41 -0400

On Sat, Jul 15, 2017 at 11:01:47PM -0700, Shawn Pearce wrote:

> > How deep would you anticipate stacks getting? Would it be feasible for
> > the tip to contain the names of the tables in the entire chain? If we're
> > talking about 20 (or even 32) bytes per name, you could still fit over a
> > hundred names in a 4K inode.
> 
> I think we'd want to keep the stacks under 32, which is a reasonable
> amount of space used in the header of each reftable. I don't have this
> yet in my updated document + implementation, but I'll look at trying
> to add it over the next couple of days. Your idea to hold the explicit
> list of the stack in each reftable makes for a very safe atomic reader
> view.

Great.  I was thinking about this a bit and have one more possible
tweak.

If you store the names of the dependent reftables in each table, then
you end up using storage quadratic in the size of the stack. Because
the bottom-most table has 0 pointers, then the next one has 1, and then
next one has 2, and so on, until the nth one has n.

Now we're talking about n=32 here, so that's probably OK.

But one variant is that the reftables _don't_ know about their
ancestors. Instead, the list of reftables is kept in a top-level pointer
file, and it's that pointer file which is rewritten on update. I.e., a
write is something like:

   1. Take reftable.lock

   2. Write reftables/1234abcd to represent your update.

   3. Copy the old reftable to reftable.lock, then append "1234abcd".

   4. Atomic rename into place.

And the reader is just:

  1. Open reftable, read the list of tables.

  2. In parallel, open/fetch each of the tables and find your starting
     pointer for iteration/lookup.

  3. Do a list-merge on the open tables.

The one thing you lose is that "unreachable" reftables no longer form a
meaningful hierarchy. With the pointers inside the reftables themselves,
if your "reftable" file got corrupted, you could find the dangling table
at the apex of the graph and have a good guess at the ref state.
Without, you just have a jumble of states and you don't know which takes
precedence (though you could probably make a good guess from mtimes).

> I added log support to the reftable format. I updated [1] to reflect
> log blocks at the end of the file. I ran a year's worth of log
> records, 149,932 log entries on 43,061 refs to test:

Cool. I'll be on vacation for the next week, so apologies if I don't
keep the discussion going. But I'm very excited about the overall
direction. :)

-Peff