Re: [PATCH 5/5] pretty describe: add %ds, %dn, %dd placeholders

René Scharfe <rene.scharfe@xxxxxxxxxxxxxx> · Sun, 04 Nov 2007 18:27:39 +0100

Johannes Schindelin schrieb:
> Hi,
> 
> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
> 
>> Johannes Schindelin schrieb:
>>
>>> On Sun, 4 Nov 2007, Ren? Scharfe wrote:
>>>
>>>> +	unsigned long occurs[ARRAY_SIZE(table)];
>>> You do not ever use the counts.  So, longs are overkill.  Even ints 
>>> might be overkill, but probably the most convenient.  I would have 
>>> gone with chars.  If I knew how to memset() an array of unsigned:1 
>>> entries to all zero, I would even have gone with that, but the runtime 
>>> cost of this is probably higher than the chars.
>> Well, it isn't used in format_commit_message() currently, but it could 
>> be.  Multiply the count and and the length of each substitution (minus 
>> the length of the placeholder) and you get the number of bytes you need 
>> to allocate.  interpolate() wouldn't need to be called twice anymore.
> 
> The better change, of course, would be to migrate interpolate() to strbuf.  
> Then you do not have to play clever tricks anymore.
>
>>> But the even more fundamental problem is that you count the needed 
>>> interpolations _every_ single time you output a commit message.
>>>
>>> A much better place would be get_commit_format().  Yes that means 
>>> restructuring the code a bit more, but I would say that this definitely 
>>> would help.  My preference would even be introducing a new source file for 
>>> the user format handling (commit-format.[ch]).
>> Counting the interpolations is easier than actually interpolating. 
>> Wherever the code goes, the calls to interpolate() and interp_count() 
>> should stay together.
> 
> No.
> 
> The purpose of calling interp_count() is to know what placeholders have to 
> be filled with substitutes.  As a consequence, the _logical_ thing to do 
> is call interp_count() _once_.
> 
> It makes absolutely no sense to call the function over and over again, 
> only to end up with the same result over and over again.

To allow this optimization, you need to make the (not yet filled)
interpolation table available to the new callsite of interp_count().
And you need to somehow pass the result of interp_count() from every
caller of it to the setup code in format_commit_message().

To see if it's worthwhile, I've just replaced the array "occurs" and the
call to interp_count() with a static array, and measured the runtime.
The speed difference was lost in the noise.

>>>> +
>>>> +/*
>>>> + * interp_count - count occurences of placeholders
>>>> + */
>>>> +void interp_count(unsigned long *result, const char *orig,
>>>> +                  const struct interp *interps, int ninterps)
>>>> +{
>>>> +	const char *src = orig;
>>> You do not ever use orig again.  So why not just use that variable instead 
>>> of introducing a new one?
>> I simply copied interpolate() and then chopped off the parts not needed
>> for counting, to make it easy to see that this is the smaller brother.
> 
> It is not.  It does not do any substitution.  It is a pure helper for the 
> process of filling the interpolation table.

Sure.  It's important, though, that it reports the same number of
substitutions as interpolate() later actually performs.  Correctness
trumps cleanliness, and its easier to check that a copy is correct, even
if certain pieces are missing.

>>> I'd rewrite this whole loop as
>>>
>>> 	while ((c = *(orig++)))
>>> 		if (c == '%')
>>> 			/* Try to match an interpolation string. */
>>> 			for (i = 0; i < ninterps; i++)
>>> 				if (prefixcmp(orig, interps[i].name)) {
>>> 					result[i] = 1;
>>> 					orig += strlen(interps[i].name);
>>> 					break;
>>> 				}
>> Cleanups are sure possible, but they should be done on top, and to both 
>> interpolate() and interp_count().  Let's first see how far we get with 
>> dumb code-copying and reusing the result in new ways. :)
> 
> Code copying is one of the primary sources for bad code.  Let's not even 
> start.
> 
> IMHO there have to be _very_ good reasons to commit something that you 
> plan to fix later anyway.

Code copying can be bad if one copies bugs.  But code copying allows a
strange feat: new code can inherit maturity.  If you copy known good
code and then change it in trivial ways (keeping the structure etc.) to
make it do new things, then the chance of a bug creeping in is lower
than if you wrote that piece of code anew.

> One such good reason would be that it is too hard to do in one go.  
> Another good reason would be that you think the fix is not even needed 
> (like I did when I wrote format: in the first place; I am quite surprised 
> that after _that_ long a time people complain -- I'd have expected 
> complaints right away or never).

Not everybody is as fast as you, Dscho. ;-)

Another idea that I was kicking around, but didn't get time to
implement: a performance regression test suite, i.e. make test for
timings and memory usages.

> In this case, I see no reason why we should go for suboptimal code first.
> 
> But hey, if you do not want to do it, I'll do it.  Just say so.

Busted again!  I wanted to see if someone else would pick up the
janitorial work for me. :-)

In any case, interpolate.c needs some attention, with or without my
patch.  I agree that a native strbuf version would be nice.  How about
an interface like that:

	typedef const char *(*expand_fn_t)
		(const char *placeholder, void *context);
	void strbuf_addexpand(struct strbuf *sb, const char *format,
	                      const char **placeholders,
	                      expand_fn_t fn, void *context);

strbuf_addexpand() would call fn() when it recognizes a placeholder,
avoiding unneeded setup code.  It could cache the result, so that fn()
gets called at most a single time for each given placeholder.  context
would be passed through to fn(), e.g. a struct commit in case of
format_commit_message().  Makes sense?

Thanks,
René
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html