Re: reftable [v4]: new ref storage format

Jeff King <peff@xxxxxxxx> · Wed, 2 Aug 2017 16:28:50 -0400

On Wed, Aug 02, 2017 at 12:50:39PM -0700, Junio C Hamano wrote:

> With the traditional "packed-refs plus loose" layout, no matter how
> many times a handful of selected busy refs are updated during the
> day, you'd need to open at most two files to find out the current
> value of a single ref (admittedly, the accessing of the second file,
> after we realize that there is no loose one, would be very costly).
> If you make a few commits on a topic branch A, then build a 100
> commit series on top of another topic branch B, finding the current
> value of A is still one open and read of refs/heads/A.
> 
> With the reftable format, we'd need to open and read all 100
> incremental transactions that touch branch B before realizing that
> none of them talk about A, and read the next transaction file to
> find the current value of A.  To keep this number low, we'd need
> quite a frequent compaction.

I think this is where compaction cleverness can come in.

One relatively easy optimization would be to note when the most recent
reftable contains a subset of the refs we are currently updating (and
the obvious string-of-updates to a single ref falls under that), and do
a "quick" compaction where we simply drop[1] that reftable in favor of
ours. That compaction is essentially free, because we know those entries
aren't valid anymore anyway.

I'm actually not sure if this is a strict "drop", though, because of
reflogs. If the reflog is written into the same file as the ref update,
then you'd need to roll its entry into your new update, too. But see
below anyway.

For more complicated cases, there's some cleverness you can do with
small on-the-fly compactions. Even if there are entries in the last few
reftables that we're not currently updating, it's pretty cheap to roll a
few of them up into our new reftable if it lets us drop some
intermediate reftables. E.g., if we're writing a new reftable with a 4K
block size but only have 100 bytes of new data, we're probably best to
roll up a previous 500-byte reftable.

That one's an obvious optimization because we know that the filesystem
is going to make us spend 4K either way, so rounding up to that is
generally free-ish.

What's less obvious is when we should roll up a bunch of 4K tables into
one (let's say) 32K table.  I think there's a formula for doing this
geometrically so that the amortized cost of writing stays under a
certain bound (linear?). But I haven't thought it through (or looked it
up); I was hoping Shawn had already done so and could dump that wisdom
on us.

> We can just declare that reftable format is not for desktop clients
> but for server implementations where frequent compaction would not
> be an annoyance to the users, but I'd wish we do not have to.

Yeah, I agree we should avoid that. I would love to eventually kill off
the loose/packed backend (or at least make it historical and read-only,
so that a reasonable response to people complaining about its races and
lack of atomicity is "please upgrade to reftables").

-Peff