On Sat, Jul 03, 2010 at 08:55:43PM -0400, tytso@xxxxxxx wrote: > > I noticed that my improved time for "git tag --contains" was similar to > > the total time for "git rev-list --all >/dev/null". Can you try timing > > that? My suspicion is that it is going to be about 2.9 seconds for you. > > I'm at home, so getting access to my work machine is a bit of a pain, > so I replicated the experiment at home. Thanks. Those numbers confirm what I had been thinking. > Yep, it does blow up in the face of the extreme clock skew in some of > the ext4 commits in the Linux kernel tree. (Sorry about that; mea > culpa, I didn't realize at the time this was a problem, and it was my > workflow using the guilt program which happened to introduce them.) Yes, I was thinking specifically of those commits when I warned about clock skew. :) > In any case, because of the ext4 commits, I can show a test case which > doesn't work well with your date cutoff patch: Not surprising. I think you will find that "git name-rev" (or "git describe --contains", which simply calls name-rev) will have similar problems. > (Or maybe we have git config options that can enable or disable > optimizations that depend on the lack of clock skews; but I could > understand people not wanting to maintian the extra code paths.) I think the best thing we can do is provide a "how much clock skew to tolerate" variable, and give it a sane default. Then people who know they have skewed repositories can make the correctness-optimization tradeoff as they see fit. The extra code is very minor. It's really only a line or two of code when calculating the cutoff date to convert "be thorough" into a cutoff date of 0. The real question is what that default should be. Name-rev already uses 86400 seconds. The worst skew in git.git is 8 seconds. The worst skew in linux-2.6.git is 8622098 (about 100 days). For reference, here are my timings on "git tag --contains HEAD~200" for various allowable clock skew values: 0 (don't allow even a second of clock skew): .035s 86400 (one day of clock skew allowed): .034s 8622098 (the worst skew in linux-2.6): .252s infinite (never cutoff for clock skew): 5.373s So anything below a day is pointless and lost in the noise. Even 100 days yields quite a satisfactory speedup from the current code, but obviously that number is totally arbitrary based on one repo. As you probably guessed from the specificity of the number, I wrote a short program to actually traverse and find the worst skew. It takes about 5 seconds to run (unsurprisingly, since it is doing the same full traversal that we end up doing in the above numbers). So we could "autoskew" by setting up the configuration on clone, and then periodically updating it as part of "git gc". That is perhaps over-engineering (and would add a few seconds to a clone), but I like that it would Just Work without the user doing anything. I'll follow this mail up with a series that implements a cleaned-up version of the previous patches in this thread, and we'll see what others think. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html