Re: Git commit generation numbers

Jeff King <peff@xxxxxxxx> · Thu, 14 Jul 2011 16:01:44 -0400

On Thu, Jul 14, 2011 at 12:23:31PM -0700, Linus Torvalds wrote:

> On Thu, Jul 14, 2011 at 12:08 PM, Jeff King <peff@xxxxxxxx> wrote:
> >
> > If we aren't going to go whole-hog on generation numbers, I'm much more
> > tempted to simply keep using commit timestamps.
> 
> Sure. I think it's entirely reasonable to say that the issue basically
> boils down to one git question: "can commit X be an ancestor of commit
> Y" (as a way to basically limit certain algorithms from having to walk
> all the way down). We've used commit dates for it, and realistically
> it really has worked very well. But it was always a broken heuristic.

Yeah, I agree with that.

> So yes, I personally see generation counters as a way to do the commit
> date comparisons right. And it would be perfectly fine to just say "if
> there are no generation numbers, we'll use the datestamps instead, and
> know that they could be incorrect".

In that case, is it really worth adding generation numbers to the cache?
Because they _can_ be wrong, too. I suspect they will be wrong less
often than commit timestamps, if only because they're dirt simple to
calculate. But all it takes is some crappy porcelain doing:

  git cat-file commit $foo |
  munge_the_parents |
  git hash-object -t commit --stdin -w

to give us a bogus object. Sure, we can catch it via fsck. But we could
also catch commit timestamp skew via fsck just as easily.

> That "use the datestamps" fallback thing may well involve all the
> heuristics we already do (ie check for the stamps looking sane, and
> not trusting just one individual one).

Those aren't foolproof, of course. I asked people a few months ago to
run my skew-detection program on various repos, and some repos have long
runs of skew (think somebody with a bad clock or a bogus program doing a
whole series). But they're fast and work OK in practice. We should apply
them more consistently (name-rev, for example, will tolerate a day of
skew, but will not look past a single commit).

And if people really want to be thorough, we can mark the skewed commits
in a cache during "git gc" for them (or they can just say "for this
traversal, I want to be thorough; turn off timestamp cutoffs").

Out of curiosity, what don't you like about the generation cache? The
idea of using external storage? Generating it on the fly? The particular
implementation is too slow or crappy?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html