On Mon, Jul 05, 2010 at 10:10:12AM -0400, tytso@xxxxxxx wrote: > As time progresses, the clock skew breakage should be less likely to > be hit by a typical developer, right? That is, unless you are > specifically referencing one of the commits which were skewed, two > years from now, the chances of someone (who isn't doing code > archeology) of getting hit by a problem should be small, right? This It's not about directly referencing skewed commits. It's about traversing history that contains skewed commits. So if I have a history like: A -- B -- C -- D and "B" is skewed, then I will generally give up on finding "A" when searching backwards from "C" or "D", or their descendants. So as time moves forward, you will continue to have your old tags pointing to "C" or "D", but also tags pointing to their descendants. Doing "git tag --contains A" will continue to be inaccurate, since it will continue to look for "A" from "C" and "D", but also from newer tags, all of which involve traversing the skewed "B". What I think is true is that people will be less likely to look at "A" as time goes on, as code it introduced presumably becomes less relevant (either bugs are shaken out, or it gets replaced, or whatever). And obviously looking at "C" from "D", the skew in "B" will be irrelevant. So I think typical developers become less likely to hit the issue as time goes on, but software archaeologists will hit it forever. > If so, I could imagine the automagic scheme choosing a default that > only finds the worst skew in the past N months. This would speed up > things up for users who are using repositories that have skews in the > distant past, at the cost of introducing potentially confusuing edge > cases for people doing code archeology. How do you decide, when looking for commits that have bogus timestamps, which ones happened in the past N months? Certainly you can do some statistical analysis to pick out anomalous ones. And you could perhaps favor future skewing over past skewing, since that skew doesn't tend to impact traversal cutoffs (and large past skewing seems to be more common). But that is getting kind of complex. > I'm not sure this is a good tradeoff, but given in practice how rarely > most developers go back in time more than say, 12-24 months, maybe > it's worth doing. What do you think? I'm not sure. I am tempted to just default it to 86400 and go no further. People who care about archaeology can turn off traversal cutoffs if they like, and as the skewed history ages, people get less likely to look at it. We could also pick half a year or some high number as the default allowable. The performance increase is still quite noticeable there, and it covers the only large skew we know about. I'd be curious to see if other projects have skew, and how much. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html