Re: [PATCH 2/3] parse_commit(): parse timestamp from end of line

Jeff King <peff@xxxxxxxx> · Tue, 25 Apr 2023 01:23:59 -0400

On Mon, Apr 24, 2023 at 10:05:42AM -0700, Junio C Hamano wrote:

> > +	/*
> > +	 * parse to end-of-line and then walk backwards, which
> > +	 * handles some malformed cases.
> > +	 */
> 
> I would say "parse to" -> "jump to", but technically moving forward
> looking for a LF byte is still "parsing".  "some" malformed cases
> being "most plausible" ones (due to how ident.c::fmt_ident() is what
> writes '>' after the string end-user gave as e-mail) may be worth
> mentioning.

I'll expand this to:

  /*
   * Jump to end-of-line so that we can walk backwards to find the
   * end-of-email ">". This is more forgiving of malformed cases
   * because unexpected characters tend to be in the name and email
   * fields.
   */

> > -	dateptr = buf;
> > -	while (buf < tail && *buf++ != '\n')
> > +	for (dateptr = eol; dateptr > buf && dateptr[-1] != '>'; dateptr--)
> >  		/* nada */;
> 
> OK.  Just a style thing, but I found that "; /* nada */" is easier
> to spot that there is an empty statement there.

I found it ugly, too, but it's from earlier. Since the point is to
advance "dateptr", it may be better to turn it into a while loop anyway,
which side-steps the empty statement altogether:

  dateptr = eol;
  while (dateptr > buf && dateptr[-1] != '>')
	dateptr--;

> > -	if (buf >= tail)
> > +	if (dateptr == buf || dateptr == eol)
> >  		return 0;
> 
> Curious when dateptr that wanted to scan back from eol is still at
> eol after the loop.  It is when the ident line ends with ">" without
> any timestamp/tz info.   And the reason why we need to check that
> here is ...

Yeah, though as you saw in the next patch, it is not really sufficient. :)

I think it may be redundant, though.

In the original we already bailed earlier if we didn't find a ">"
(because "buf >= tail" after we advanced it looking for ">"). Here
"dateptr == buf" is checking the same thing (because we walked
backwards).

In the original we'd bail if we failed to find a newline as part of this
loop (because "buf >= tail" after looking for a newline). But that can't
happen here; we've already bailed after memchr() failed to find a
newline). So "dateptr == eol" only triggers if there were no characters
in "buf" to look at.

So I added it only to keep this comment trivially true:

> > -	/* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */
> > +
> > +	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
> >  	return parse_timestamp(dateptr, NULL, 10);
> 
> ... because parse_timestamp() is merely strtoumax() and would
> happily skip over arbitrary number of leading "whitespace" without
> stopping if (dateptr == eol && *eol == '\n').  OK, sad but correct.

But it could also read "dateptr <= eol" and still be true (which is to
say it is mostly accurate, but not quite because of the "soaking up
whitespace" problem fixed by the next patch.

I'll leave the extra condition in this patch, since it's orthogonal to
what this patch is fixing. But in the next one I'll remove it and expand
the comment to explain a bit more.

-Peff