Re: another packed-refs race

Jeff King <peff@xxxxxxxx> · Tue, 7 May 2013 00:44:56 -0400

On Tue, May 07, 2013 at 06:32:12AM +0200, Michael Haggerty wrote:

> Another potential problem caused by the non-atomicity of loose reference
> reading could be to confuse reachability tests if process 1 is reading
> loose references while process 2 is renaming a reference:
> 
> 1. Process 1 looks for refs/heads/aaaaa and finds it missing.
> 
> 2. Process 2 renames zzzzz -> aaaaa
> 
> 3. Process 1 looks for refs/heads/zzzzz and finds it missing.
> 
> Process 2 would think that any objects pointed to by aaaaa (formerly
> zzzzz) are unreachable.  This would be unfortunate if it is doing an
> upload-pack and very bad if it is doing a gc.  I wonder if this could be
> a problem in practice?  (Gee, wouldn't it be nice to keep reflogs for
> deleted refs? :-) )

Ugh. Yeah, that is definitely a possible race, and it could be quite
disastrous for prune.

I am really starting to think that we need a pluggable refs
architecture, and that busy servers can use something like sqlite to
keep the ref storage. That would require bumping repositoryformatversion,
of course, but it would be OK for a server accessible only by git
protocols.

I also wonder if we can spend extra time to get more reliable results
for prune, like checking refs, coming up with a prune list, and then
checking again. I have a feeling it's a 2-generals sort of problem where
we can always miss a ref that keeps bouncing around, but we could bound
the probability. I haven't thought that hard about it. Perhaps this will
give us something to talk about on Thursday. :)

> > My proposal above gets rid of the need to invalidate the loose refs
> > cache, so we can ignore that complexity.
> 
> The same concern applies to invalidating the packed-ref cache, which you
> still want to do.

True. In theory a call to resolve_ref can invalidate it, so any call
from inside a for_each_ref callback would be suspect.

> * Preloading the whole tree of loose references before starting an
> iteration.  As I recall, this was a performance *win*.  It was on my
> to-do list of things to pursue when I have some free time (ha, ha).  I
> mostly wanted to check first that there are not callers who abort the
> iteration soon after starting it.  For example, imagine a caller who
> tries to determine "are there any tags at all" by iterating over
> "refs/tags" with a callback that just returns 1; such a caller would
> suffer the cost of reading all of the loose references in "refs/tags".

Well, you can measure my patches, because that's what they do. :) I
didn't really consider an early termination from the iteration.
Certainly something like:

  git for-each-ref refs/tags | head -1

would take longer. Though if you have that many refs that the latency is
a big problem, you should probably consider packing them (it can't
possibly bite you with a race condition, right?).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html