Re: [PATCH] general style: replaces memcmp() with proper starts_with()

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 13 Mar 2014 10:46:01 -0700

Quint Guvernator <quintus.public@xxxxxxxxx> writes:

>> The result after the conversion, however, still have the same magic
>> numbers, but one less of them each.  Doesn't it make it harder to
>> later spot the patterns to come up with a better abstraction that
>> does not rely on the magic number?
>
> It is _not_ my goal to make the code harder to maintain down the road.

Good.

> So, at this point, which hunks (if any) are worth patching?

Will, I am not going through all the mechanical hits to memcmp() and
judge each and every one if it is a good idea to convert.  Anybody
who does so in order to tell you "which hunks are worth patching"
would end up being the one doing the real work, and at that point
there is nothing left to be credited as your work anymore ;-)

But as Peff said, there are good bits, like these ones just for a
few examples.

diff --git a/builtin/apply.c b/builtin/apply.c
index a7e72d5..16c20af 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -1631,7 +1631,7 @@ static int parse_fragment(const char *line, unsigned long size,
 		 * l10n of "\ No newline..." is at least that long.
 		 */
 		case '\\':
-			if (len < 12 || memcmp(line, "\\ ", 2))
+			if (len < 12 || !starts_with(line, "\\ "))
 				return -1;
 			break;
 		}
@@ -1646,7 +1646,7 @@ static int parse_fragment(const char *line, unsigned long size,
 	 * it in the above loop because we hit oldlines == newlines == 0
 	 * before seeing it.
 	 */
-	if (12 < size && !memcmp(line, "\\ ", 2))
+	if (12 < size && starts_with(line, "\\ "))
 		offset += linelen(line, size);
 
 	patch->lines_added += added;

These two are about "An incomplete line marker begins with a
backslash and a SP" and there is no other significance in the
constant 2 (like, "after we recognise the match, we start scanning
the remainder of the line starting at the offset 2").

It is a tangent but I notice that these two parts (both in the
original and in the version after patch) contradict what the
incomplete last line marker should look like in a minor detail.

On the other hand, I think this one from nearby is iffy.

@@ -846,8 +846,8 @@ static int has_epoch_timestamp(const char *nameline)
 	 * YYYY-MM-DD hh:mm:ss must be from either 1969-12-31
 	 * (west of GMT) or 1970-01-01 (east of GMT)
 	 */
-	if ((zoneoffset < 0 && memcmp(timestamp, "1969-12-31", 10)) ||
-	    (0 <= zoneoffset && memcmp(timestamp, "1970-01-01", 10)))
+	if ((zoneoffset < 0 && !starts_with(timestamp, "1969-12-31")) ||
+	    (0 <= zoneoffset && !starts_with(timestamp, "1970-01-01")))
 		return 0;
 
 	hourminute = (strtol(timestamp + 11, NULL, 10) * 60 +

If one looks at the post-context of the hunk, one would realize that
this codepath very intimately knows how the timestamp should look
like at the byte-offset level, not just "YYYY-MM-DD ought to be
10-byte long", but "there should be two-digit hour part after
skipping one byte after that YYYY-MM-DD part, followed by two-digit
minute part after further skipping one byte", knowing that these
details are guaranteed by the stamp_regexp[] pattern it earlier made
sure the given string would match.

I do not think it is a good idea to reduce this kind of precise
format knowledge from this function in the first place (after all,
this is parsing a header line in a traditional diff whose format is
known to us), and even if it were our eventual goal to reduce the
precise format knowledge, it would not help very much to get rid of
constant 10 only from these two memcmp() calls, and that is why I
think this hunk is iffy.

Hope the above shows what kind of thinking is needed to judge each
change from memcmp() to !starts_with().

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html