Re: reftable [v5]: new ref storage format

Jeff King <peff@xxxxxxxx> · Mon, 14 Aug 2017 23:54:33 -0400

On Mon, Aug 14, 2017 at 04:05:05PM +0000, David Turner wrote:

> > All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and
> > prevent multiple processes from using the DB concurrently. In that case,
> > maintaining coherence with other NFS clients is a non-issue. It strikes me that git
> > doesn't require concurrent multi-process access anyway, and any particular
> > process would only use the DB for a short time before closing it and going away.
> 
> Git, in general, does require concurrent multi-process access, depending on what 
> that means.
> 
> For example, a post-receive hook might call some git command which opens the 
> ref database.  This means that git receive-pack would have to close and 
> re-open the ref database.  More generally, a fair number of git commands are
> implemented in terms of other git commands, and might need the same treatment.
> We could, in general, close and re-open the database around fork/exec, but I am
> not sure that this solves the general problem -- by mere happenstance, one might
> be e.g. pushing in one terminal while running git checkout in another.  This is 
> especially true with git worktrees, which share one ref database across multiple
> working directories.

Yeah, I'd agree that git's multi-process way of working would probably
cause some headaches if there were a broad lock.

I had the impression that Howard meant we would lock for _read_
operations, too. If so, I think that's probably going to cause a
noticeable performance problem for servers.  A repository which is
serving fetches to a lot of clients (even if some of those are noops)
has to send the current ref state out to each client. I don't think we'd
want to add a serial bottleneck to that portion of each process, which
can otherwise happen totally in parallel.

Serializing writes is probably not so big a deal as long as it is kept
to the portion where the process is actively writing out values. And as
long as there's a reasonable backoff/retry protocol; right now we don't
generally bother retrying ref locks because they're taken individually,
so racing on a lock almost certainly[1] means that you've lost the
sha1-lease and need to restart the larger operation.

-Peff

[1] Actually, we've found this isn't always true. Things like ref
    packing require taking locks for correctness, which means they can
    interfere with actual ref updates. That's yet another thing it would
    be nice to get rid of when moving away from the loose/packed
    storage.