Re: Git commit generation numbers

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 15 Jul 2011 09:44:21 -0700

On Fri, Jul 15, 2011 at 9:18 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>
> What about going forward making the requirement that a new commit must
> have a committer date whose date is >= the maximum date of its
> parents?

So you suggest just making commit dates be the generation number.

I'd be ok with that. It's basically what we've been doing for the last
six years.

But in that case, we shouldn't be doing the generation count cache either.

Btw, I do agree that we probably should add a warning for the case
("your clock is wrong - your commit date is before the commit date of
your parents") and maybe require the use of "-f" or something to
override it. That would certainly be a good thing quite independently
of anything else. So regardless of generation counts, it's probably
worth it.

But if you think commit date is good enough for generation counts -
and I'm not arguing against it - then please tell me why you would
then want to have a separate generation count cache.

So I would like to repeat: I think our commit-date based hack has been
pretty successful. We've lived with it for years and years. Even the
"let's try to fix it by adding slop" code is from three years ago
(commit 7d004199d1), which means that for three years we never really
saw any serious problems. I forget what problem we actually did see -
I have this dim memory of it being Ted that had problems with a merge
because git picked a crap merge base, but that may just be my
Alzheimer's speaking.

Obviously there are cases where we miss some merge base and it doesn't
really end up mattering, so we may well have a *ton* of commits that
have bad dates, but they just haven't affected us enough for us to
care. That's fine too - I dislike how our algorithm isn't truly
reliable, but at the same time I think we're so robust that it all
works regardless.

So I think it's ugly and fairly hacky, but it has worked well enough
in practice. I dislike our commit dates, but I don't _hate_ them. I do
think it was a mistake, but not one I'm especially ashamed of.

So why do I dislike the generation count cache so much? I dislike it
exactly because

  "if the commit date isn't good enough, then dammit, we should have
just added a generation count".

And if we should have added it six years ago, then we should add it
today. Not say "oh, we made a mistake six years ago, let's work around
the mistake instead of fixing it".

That's really what it boils down to. Let's not paper over a mistake.
Either we need the generation depth or we don't. And if we do need it,
we should replace the date-based hackery with it (where "replace" may
well be "still fall back on our traditional date-based hackery in the
absense of generation counters").

But if we decide that we don't really need generation counters AT ALL,
and can just continue with the commit date hack, then I'm personally
ok with that too.

So to me, it's a "either or" situation. Either the commit dates are
good enough, or we should add generation counts to the commits.

But in *neither* case is it ok to do some external cache to work around it.

                          Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html