Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders

René Scharfe <rene.scharfe@xxxxxxxxxxxxxx> · Sun, 04 Nov 2007 15:42:03 +0100

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>> The new placeholders %ds (description string, git-describe style), %dn 
>> (the name part) and %dd (the depth part) are added.
>>
>> To avoid imposing the significant cost of running describe_commit() on 
>> every format string, even if none of the new placeholders are used, a 
>> new function, interp_count(), is introduced.  It is a stripped down 
>> version of interpolate(), that simply counts the placeholders in a 
>> format string.  If the describe placeholders are not found, setting up 
>> the corresponding replacements is skipped.
> 
> The way I read this, those are two really quite independent patches 
> squashed into one.

Busted!  I didn't want to introduce a performance regression with the
%ds parsing code, but I also didn't want to add a function without
users.  Patch 6 was then glued on as an afterthought -- the rest of the
series was ready when I saw Paul's mail.

So, a better way might be to move patch 6 to the head of the series and
introduce interp_count() in it, too.

>> +	unsigned long occurs[ARRAY_SIZE(table)];
> 
> You do not ever use the counts.  So, longs are overkill.  Even ints might 
> be overkill, but probably the most convenient.  I would have gone with 
> chars.  If I knew how to memset() an array of unsigned:1 entries to all 
> zero, I would even have gone with that, but the runtime cost of this is 
> probably higher than the chars.

Well, it isn't used in format_commit_message() currently, but it could
be.  Multiply the count and and the length of each substitution (minus
the length of the placeholder) and you get the number of bytes you need
to allocate.  interpolate() wouldn't need to be called twice anymore.

> But the even more fundamental problem is that you count the needed 
> interpolations _every_ single time you output a commit message.
> 
> A much better place would be get_commit_format().  Yes that means 
> restructuring the code a bit more, but I would say that this definitely 
> would help.  My preference would even be introducing a new source file for 
> the user format handling (commit-format.[ch]).

Counting the interpolations is easier than actually interpolating.
Wherever the code goes, the calls to interpolate() and interp_count()
should stay together.

>> +
>> +/*
>> + * interp_count - count occurences of placeholders
>> + */
>> +void interp_count(unsigned long *result, const char *orig,
>> +                  const struct interp *interps, int ninterps)
>> +{
>> +	const char *src = orig;
> 
> You do not ever use orig again.  So why not just use that variable instead 
> of introducing a new one?

I simply copied interpolate() and then chopped off the parts not needed
for counting, to make it easy to see that this is the smaller brother.

> 
>> +	const char *name;
>> +	unsigned long namelen;
>> +	int i;
>> +	char c;
>> +
>> +	for (i = 0; i < ninterps; i++)
>> +		result[i] = 0;
> 
> memset()?

In earlier versions there was a memset() there.  I replaced it to make
the intent even more clear, but I guess not using memset() is simply
superstition..

> 
>> +
>> +	while ((c = *src)) {
>> +		if (c == '%') {
>> +			/* Try to match an interpolation string. */
>> +			for (i = 0; i < ninterps; i++) {
>> +				name = interps[i].name;
>> +				namelen = strlen(name);
>> +				if (strncmp(src, name, namelen) == 0)
> 
> prefixcmp()?

Copied from interpolate()..

>> +					break;
>> +			}
>> +
>> +			/* Check for valid interpolation. */
>> +			if (i < ninterps) {
> 
> This is ugly.  You already had a successful if() for that case earlier.

Dito..

> 
>> +				result[i]++;
>> +				src += namelen;
>> +				continue;
>> +			}
>> +		}
>> +		src++;
>> +	}
>> +}
> 
> I'd rewrite this whole loop as
> 
> 	while ((c = *(orig++)))
> 		if (c == '%')
> 			/* Try to match an interpolation string. */
> 			for (i = 0; i < ninterps; i++)
> 				if (prefixcmp(orig, interps[i].name)) {
> 					result[i] = 1;
> 					orig += strlen(interps[i].name);
> 					break;
> 				}

Cleanups are sure possible, but they should be done on top, and to both
interpolate() and interp_count().  Let's first see how far we get with
dumb code-copying and reusing the result in new ways. :)

Thanks,
René
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html