On Mon, Apr 17, 2023 at 02:51:42AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > There may be also cases where the two diverge. Obviously having two > > parsers isn't ideal. I think sharing the code may involve a lot of work, > > though. The pretty-print parser is interested in pulling out more > > information, and is less focused on performance. Parsing commits is > > traditionally a hot path, as we historically had to parse every commit, > > even if we weren't showing it (including non-log operations like > > computing merge bases, reachability, and so forth). > > > > But that may not matter so much. One, we already inflate the whole > > commit object, not just the header. So even if we spend a few extra > > instructions on parsing, it may not be noticeable. And two, these days > > we often cache commit metadata in the commit-graph files, which avoids > > loading the commit message entirely (and thus this parsing) for most > > operations. > > Makes readers wonder which parser is used to parse commit objects in > order to populate the commit-graph files. If that step screws up, > we'd record a broken timestamp there X-<. It should be be the parse_commit() one, since the commit-graph writing code works off of parsed "struct commit" objects to find its data. Which is good, since we may silently optimize out a parse in favor of the commit-graph; we'd want the two to always match. So recording a broken timestamp there is OK (but see below). Where it gets trickier is if we then fix the parser to handle this case. Now the commit-graph stores a broken value, and further writes are clever enough to say "ah, I already have that commit in the graph; no need to recompute its values". So once you start using a version of Git with the more lenient parser, you'd have to manually blow away any graph files and rebuild to see the benefit. One thing the commit graph perhaps _could_ do is omit the commit, or mark it as "this one is broken in some way". And then fall back to parsing those few instead (which is slower, but if it's a small minority of commits, that's OK). But I don't think there's any code for that. And it's kind of tricky anyway, because the parser just sets the date to "0" with no indication that there was any potential error. So it's probably solvable, but perhaps not worth the effort. The current rule is just "whatever we got from parsing is what the commit-graph will return", which works well in practice unless the parser changes. -Peff