On Tue, Apr 25, 2023 at 01:52:45AM -0400, Jeff King wrote: > Here's a v2 of my series. The behavior should be identical, but I've > incorporated some comment and small code tweaks based on feedback from > the first round. > > I also added a fourth patch which adds a new comment explaining some of > the cases that were alluded to in the earlier round's patch 3. > > [1/4]: t4212: avoid putting git on left-hand side of pipe > [2/4]: parse_commit(): parse timestamp from end of line > [3/4]: parse_commit(): handle broken whitespace-only timestamp > [4/4]: parse_commit(): describe more date-parsing failure modes > > commit.c | 47 +++++++++++++++++++++++++++++++++++------- > t/t4212-log-corrupt.sh | 39 +++++++++++++++++++++++++++++++++-- > 2 files changed, 76 insertions(+), 10 deletions(-) Whoops, forgot my range-diff (though nothing should be too surprising based on the round 1 discussion): 1: 07932cf666 = 1: ac38ce133d t4212: avoid putting git on left-hand side of pipe 2: 7ee34c7d5f ! 2: f59e61262d parse_commit(): parse timestamp from end of line @@ Commit message parse back to the final ">". In theory we could use split_ident_line() here, but it's actually a bit more strict. In particular, it requires a valid time-zone token, too. That should be present, of course, but we - wouldn't want to break --until for malformed cases that are working - currently. + wouldn't want to break --until for cases that are working currently. We might want to teach split_ident_line() to become more lenient there, but it would require checking its many callers (since right now they can @@ commit.c: static timestamp_t parse_commit_date(const char *buf, const char *tail - if (buf >= tail) + + /* -+ * parse to end-of-line and then walk backwards, which -+ * handles some malformed cases. ++ * Jump to end-of-line so that we can walk backwards to find the ++ * end-of-email ">". This is more forgiving of malformed cases ++ * because unexpected characters tend to be in the name and email ++ * fields. + */ + eol = memchr(buf, '\n', tail - buf); + if (!eol) return 0; - dateptr = buf; - while (buf < tail && *buf++ != '\n') -+ for (dateptr = eol; dateptr > buf && dateptr[-1] != '>'; dateptr--) - /* nada */; +- /* nada */; - if (buf >= tail) ++ dateptr = eol; ++ while (dateptr > buf && dateptr[-1] != '>') ++ dateptr--; + if (dateptr == buf || dateptr == eol) return 0; - /* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */ 3: e8e94083f5 ! 3: c62fc59bf1 parse_commit(): handle broken whitespace-only timestamp @@ Commit message It's not subject to the same bug, because it insists that there be one or more digits in the timestamp. - We can use the same logic here. If there's a non-whitespace but - non-digit value (say "committer name <email> foo"), then - parse_timestamp() would already have returned 0 anyway. So the only - change should be for this "whitespace only" case. - Signed-off-by: Jeff King <peff@xxxxxxxx> ## commit.c ## @@ commit.c: static timestamp_t parse_commit_date(const char *buf, const char *tail) - if (dateptr == buf || dateptr == eol) + dateptr = eol; + while (dateptr > buf && dateptr[-1] != '>') + dateptr--; +- if (dateptr == buf || dateptr == eol) ++ if (dateptr == buf) return 0; +- /* dateptr < eol && *eol == '\n', so parsing will stop at eol */ + /* -+ * trim leading whitespace; parse_timestamp() will do this itself, but -+ * it will walk past the newline at eol while doing so. So we insist -+ * that there is at least one digit here. ++ * Trim leading whitespace; parse_timestamp() will do this itself, but ++ * if we have _only_ whitespace, it will walk right past the newline ++ * while doing so. + */ + while (dateptr < eol && isspace(*dateptr)) + dateptr++; -+ if (!strchr("0123456789", *dateptr)) ++ if (dateptr == eol) + return 0; + - /* dateptr < eol && *eol == '\n', so parsing will stop at eol */ ++ /* ++ * We know there is at least one non-whitespace character, so we'll ++ * begin parsing there and stop at worst case at eol. ++ */ return parse_timestamp(dateptr, NULL, 10); } + ## t/t4212-log-corrupt.sh ## @@ t/t4212-log-corrupt.sh: test_expect_success 'absurdly far-in-future date' ' -: ---------- > 4: 28ed51a2ca parse_commit(): describe more date-parsing failure modes