Re: another packed-refs race

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Tue, 07 May 2013 10:03:06 +0200

On 05/07/2013 06:44 AM, Jeff King wrote:
> On Tue, May 07, 2013 at 06:32:12AM +0200, Michael Haggerty wrote:
> 
>> Another potential problem caused by the non-atomicity of loose reference
>> reading could be to confuse reachability tests if process 1 is reading
>> loose references while process 2 is renaming a reference:
>>
>> 1. Process 1 looks for refs/heads/aaaaa and finds it missing.
>>
>> 2. Process 2 renames zzzzz -> aaaaa
>>
>> 3. Process 1 looks for refs/heads/zzzzz and finds it missing.
>>
>> Process 2 would think that any objects pointed to by aaaaa (formerly
>> zzzzz) are unreachable.  This would be unfortunate if it is doing an
>> upload-pack and very bad if it is doing a gc.  I wonder if this could be
>> a problem in practice?  (Gee, wouldn't it be nice to keep reflogs for
>> deleted refs? :-) )
> 
> Ugh. Yeah, that is definitely a possible race, and it could be quite
> disastrous for prune.
> 
> I am really starting to think that we need a pluggable refs
> architecture, and that busy servers can use something like sqlite to
> keep the ref storage. That would require bumping repositoryformatversion,
> of course, but it would be OK for a server accessible only by git
> protocols.

That would be a fun project.  I like the idea of not burdening people's
local mostly-one-user-at-a-time repositories with code that is hardened
against server-level pounding.

> I also wonder if we can spend extra time to get more reliable results
> for prune, like checking refs, coming up with a prune list, and then
> checking again. I have a feeling it's a 2-generals sort of problem where
> we can always miss a ref that keeps bouncing around, but we could bound
> the probability. I haven't thought that hard about it. Perhaps this will
> give us something to talk about on Thursday. :)

It's not 100% solvable without big changes; there could always be a
malign Dijkstra running around your system, renaming references right
before you read them.  But I guess it would be pretty safe if pack would
keep the union of objects reachable from the references read at the
beginning of the run and objects reachable from the references read at
(aka near) the end of the run.

>> * Preloading the whole tree of loose references before starting an
>> iteration.  As I recall, this was a performance *win*.  It was on my
>> to-do list of things to pursue when I have some free time (ha, ha).  I
>> mostly wanted to check first that there are not callers who abort the
>> iteration soon after starting it.  For example, imagine a caller who
>> tries to determine "are there any tags at all" by iterating over
>> "refs/tags" with a callback that just returns 1; such a caller would
>> suffer the cost of reading all of the loose references in "refs/tags".
> 
> Well, you can measure my patches, because that's what they do. :) I
> didn't really consider an early termination from the iteration.
> Certainly something like:
> 
>   git for-each-ref refs/tags | head -1
> 
> would take longer. Though if you have that many refs that the latency is
> a big problem, you should probably consider packing them (it can't
> possibly bite you with a race condition, right?).

No, I don't see a correctness issue.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html