Re: [PATCH] general style: replaces memcmp() with proper starts_with()

Jeff King <peff@xxxxxxxx> · Wed, 12 Mar 2014 17:14:15 -0400

On Wed, Mar 12, 2014 at 01:08:03PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> >> Blindly replacing starts_with() with !memcmp() in the above part is
> >> a readability regression otherwise.
> >
> > I actually think the right solution is:
> >
> >   static inline int standard_header_field(const char *field, size_t len)
> >   {
> >           return mem_equals(field, len, "tree ") ||
> >                  mem_equals(field, len, "parent ") ||
> >                  ...;
> >   }
> >
> > and the caller should tell us it's OK to look at field[len]:
> >
> >   standard_header_field(line, eof - line + 1)
> >
> > We could also omit the space from the standard_header_field.
> 
> Yes, that was what I had in mind.  The only reason why the callee
> (over-)optimizes the "SP must follow these know keywords" part by
> using the extra "len" parameter is because the caller has to do a
> single strchr() to skip an arbitrary field name anyway so computing
> "len" is essentially free.

One thing that bugs me about the current code is that the sub-function
looks one past the end of the length given to it by the caller.
Switching it to pass "eof - line + 1" resolves that, but is it right?

The character pointed at by "eof" is either:

  1. space, if our strchr(line, ' ') found something

  2. the first character of the next line, if our
     memchr(line, '\n', eob - line) found something

  3. Whatever is at eob (end-of-buffer)

There are two questionable things here. In (1), we use strchr on a
sized buffer.  And in (3), we look one past the size that was passed in.

In both cases, we are saved by the fact that the buffer is actually NUL
terminated at the end of "size" (because it comes from read_sha1_file).
But we may find a space much further than the line ending which is
supposed to be our boundary, and end up having to do a comparison to
cover this case.

So I think the current code is correct, but we could make it more
obvious by:

  1. Restricting our search for the field separator to the current line.

  2. Explicitly avoid looking for headers when we did not find a space,
     since we cannot match anything anyway.

Like:

diff --git a/commit.c b/commit.c
index 6bf4fe0..9383cc1 100644
--- a/commit.c
+++ b/commit.c
@@ -1325,14 +1325,14 @@ static struct commit_extra_header *read_commit_extra_header_lines(
 		strbuf_reset(&buf);
 		it = NULL;
 
-		eof = strchr(line, ' ');
-		if (next <= eof)
+		eof = memchr(line, ' ', next - line);
+		if (eof) {
+			if (standard_header_field(line, eof - line + 1) ||
+			    excluded_header_field(line, eof - line, exclude))
+				continue;
+		} else
 			eof = next;
 
-		if (standard_header_field(line, eof - line) ||
-		    excluded_header_field(line, eof - line, exclude))
-			continue;
-
 		it = xcalloc(1, sizeof(*it));
 		it->key = xmemdupz(line, eof-line);
 		*tail = it;

I also think that "eof = next" (which I retained here) is off-by-one.
"next" here is not the newline, but the start of the next line. And I'm
guessing the code actually wanted the newline (otherwise "it->key" ends
up with the newline in it). But we cannot just subtract one, because if
we hit "eob", it really is in the right spot.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html