On Tue, May 07, 2013 at 06:32:12AM +0200, Michael Haggerty wrote: > Another potential problem caused by the non-atomicity of loose reference > reading could be to confuse reachability tests if process 1 is reading > loose references while process 2 is renaming a reference: > > 1. Process 1 looks for refs/heads/aaaaa and finds it missing. > > 2. Process 2 renames zzzzz -> aaaaa > > 3. Process 1 looks for refs/heads/zzzzz and finds it missing. > > Process 2 would think that any objects pointed to by aaaaa (formerly > zzzzz) are unreachable. This would be unfortunate if it is doing an > upload-pack and very bad if it is doing a gc. I wonder if this could be > a problem in practice? (Gee, wouldn't it be nice to keep reflogs for > deleted refs? :-) ) Ugh. Yeah, that is definitely a possible race, and it could be quite disastrous for prune. I am really starting to think that we need a pluggable refs architecture, and that busy servers can use something like sqlite to keep the ref storage. That would require bumping repositoryformatversion, of course, but it would be OK for a server accessible only by git protocols. I also wonder if we can spend extra time to get more reliable results for prune, like checking refs, coming up with a prune list, and then checking again. I have a feeling it's a 2-generals sort of problem where we can always miss a ref that keeps bouncing around, but we could bound the probability. I haven't thought that hard about it. Perhaps this will give us something to talk about on Thursday. :) > > My proposal above gets rid of the need to invalidate the loose refs > > cache, so we can ignore that complexity. > > The same concern applies to invalidating the packed-ref cache, which you > still want to do. True. In theory a call to resolve_ref can invalidate it, so any call from inside a for_each_ref callback would be suspect. > * Preloading the whole tree of loose references before starting an > iteration. As I recall, this was a performance *win*. It was on my > to-do list of things to pursue when I have some free time (ha, ha). I > mostly wanted to check first that there are not callers who abort the > iteration soon after starting it. For example, imagine a caller who > tries to determine "are there any tags at all" by iterating over > "refs/tags" with a callback that just returns 1; such a caller would > suffer the cost of reading all of the loose references in "refs/tags". Well, you can measure my patches, because that's what they do. :) I didn't really consider an early termination from the iteration. Certainly something like: git for-each-ref refs/tags | head -1 would take longer. Though if you have that many refs that the latency is a big problem, you should probably consider packing them (it can't possibly bite you with a race condition, right?). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html