Re: [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 26 Apr 2023 08:32:49 -0700

Jeff King <peff@xxxxxxxx> writes:

> Instead, we should go back to what the original iteration of the series
> was doing, and make sure there is at least one digit (i.e., a "forbid
> unknown" strategy). Assuming that there is no locale where ascii "1" is
> considered whitespace. ;)
>
> Note that will exclude a few cases that we do allow now, like:
>
>   committer name <email> \v123456 +0000\n
>
> Right now that parses as "123456", but we'd reject it as "0" after such
> a patch.

I would say that it is a good thing.

The only (somewhat) end-user controlled things on the line are the
name and email, and even there name is sanitized to remove "crud".
The user-supplied timestamp goes through date.c::parse_date(),
ending up with what date.c::date_string() formats, so there will not
be syntactically incorrect timestamp there.  So we can be strict
format-wise on the timestamp field, once we identify where it begins,
which is the point of scanning backwards for '>'.

Unless the user does "hash-object" and deliberately creates a
malformed commit object---they can keep both halves just fine in
such a case as long as we do reject such a timestamp correctly.

> The alternative is to check _all_ of the characters between ">" and the
> newline and make sure there is some digit somewhere, which would be
> sufficient to prevent strtoumax() from walking past the newline.
>
> I guess it's not even any more expensive in the normal case (since the
> very first non-whitespace entry should be a digit!). I'm not sure it's
> worth caring about too much either way. Garbage making it into
> name/email is an easy mistake to make (for users and implementations).
> Putting whitespace control codes into your timestamp is not, and marking
> them as "0" is an OK outcome.

Yeah, I think it is fine either way.

Thanks