Jeff King <peff@xxxxxxxx> writes: > To find the committer timestamp, we parse left-to-right looking for the > closing ">" of the email, and then expect the timestamp right after > that. But we've seen some broken cases in the wild where this fails, but > we _could_ find the timestamp with a little extra work. E.g.: > > Name <Name<email>> 123456789 -0500 > ... > So let's use the same trick as 03818a4a94: find the end of the line, and > parse back to the final ">". This obviously assumes that even in a broken ident, it is very likely that the second component (where <e-mail> usually sits) ends with a '>'. Given that we enclose whatever garbage the end user gave us as their e-mail inside a pair of <> ourselves, it is a very sensible assumption, I would think. The original parser assumed that the end user would not have '>' inside their e-mail part of the ident, which turns out to be more problematic than the alternative being proposed. It is doubly good that we already parse from the end elsewhere. Nice. > + /* > + * parse to end-of-line and then walk backwards, which > + * handles some malformed cases. > + */ I would say "parse to" -> "jump to", but technically moving forward looking for a LF byte is still "parsing". "some" malformed cases being "most plausible" ones (due to how ident.c::fmt_ident() is what writes '>' after the string end-user gave as e-mail) may be worth mentioning. > + eol = memchr(buf, '\n', tail - buf); > + if (!eol) > return 0; OK. > - dateptr = buf; > - while (buf < tail && *buf++ != '\n') > + for (dateptr = eol; dateptr > buf && dateptr[-1] != '>'; dateptr--) > /* nada */; OK. Just a style thing, but I found that "; /* nada */" is easier to spot that there is an empty statement there. > - if (buf >= tail) > + if (dateptr == buf || dateptr == eol) > return 0; Curious when dateptr that wanted to scan back from eol is still at eol after the loop. It is when the ident line ends with ">" without any timestamp/tz info. And the reason why we need to check that here is ... > - /* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */ > + > + /* dateptr < eol && *eol == '\n', so parsing will stop at eol */ > return parse_timestamp(dateptr, NULL, 10); ... because parse_timestamp() is merely strtoumax() and would happily skip over arbitrary number of leading "whitespace" without stopping if (dateptr == eol && *eol == '\n'). OK, sad but correct.