Re: Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980

Jeff King <peff@xxxxxxxx> · Tue, 18 Apr 2023 00:12:53 -0400

On Mon, Apr 17, 2023 at 02:51:42AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > There may be also cases where the two diverge. Obviously having two
> > parsers isn't ideal. I think sharing the code may involve a lot of work,
> > though. The pretty-print parser is interested in pulling out more
> > information, and is less focused on performance. Parsing commits is
> > traditionally a hot path, as we historically had to parse every commit,
> > even if we weren't showing it (including non-log operations like
> > computing merge bases, reachability, and so forth).
> >
> > But that may not matter so much. One, we already inflate the whole
> > commit object, not just the header. So even if we spend a few extra
> > instructions on parsing, it may not be noticeable. And two, these days
> > we often cache commit metadata in the commit-graph files, which avoids
> > loading the commit message entirely (and thus this parsing) for most
> > operations.
> 
> Makes readers wonder which parser is used to parse commit objects in
> order to populate the commit-graph files.  If that step screws up,
> we'd record a broken timestamp there X-<.

It should be be the parse_commit() one, since the commit-graph writing
code works off of parsed "struct commit" objects to find its data. Which
is good, since we may silently optimize out a parse in favor of the
commit-graph; we'd want the two to always match. So recording a broken
timestamp there is OK (but see below).

Where it gets trickier is if we then fix the parser to handle this case.
Now the commit-graph stores a broken value, and further writes are
clever enough to say "ah, I already have that commit in the graph; no
need to recompute its values". So once you start using a version of Git
with the more lenient parser, you'd have to manually blow away any graph
files and rebuild to see the benefit.

One thing the commit graph perhaps _could_ do is omit the commit, or
mark it as "this one is broken in some way". And then fall back to
parsing those few instead (which is slower, but if it's a small minority
of commits, that's OK). But I don't think there's any code for that. And
it's kind of tricky anyway, because the parser just sets the date to "0"
with no indication that there was any potential error. So it's probably
solvable, but perhaps not worth the effort. The current rule is just
"whatever we got from parsing is what the commit-graph will return",
which works well in practice unless the parser changes.

-Peff