Re: [PATCH 4/5] replace strbuf_expand() with strbuf_expand_step()

René Scharfe <l.s.r@xxxxxx> · Tue, 27 Jun 2023 18:31:55 +0200

Am 27.06.23 um 10:54 schrieb Jeff King:
> On Sat, Jun 17, 2023 at 10:43:17PM +0200, René Scharfe wrote:
>
>> Avoid the overhead of passing context to a callback function of
>> strbuf_expand() by using strbuf_expand_step() in a loop instead.  It
>> requires explicit handling of %% and unrecognized placeholders, but is
>> simpler, more direct and avoids void pointers.
>
> I like this. I don't care that much about the runtime overhead of
> passing the context around, but if you meant by "overhead" the
> programmer time and confusion in stuffing everything into context
> structs, then I agree this is much better. :)

Indeed, I meant the burden of being forced to define a struct and
filling in all necessary context.  Bureaucratic overhead.

> It does still feel like we should be handling "%%" on behalf of the
> callers.

I feel the same, but restrained myself from doing that, so that we
can see all the pieces for now.  It allows us to recombine them in
better ways than before.

>>  builtin/cat-file.c |  35 +++++++--------
>>  builtin/ls-files.c | 109 +++++++++++++++++++--------------------------
>>  builtin/ls-tree.c  | 107 +++++++++++++++++---------------------------
>>  daemon.c           |  61 ++++++++-----------------
>>  pretty.c           |  72 ++++++++++++++++++------------
>>  strbuf.c           |  20 ---------
>>  strbuf.h           |  37 ++-------------
>>  7 files changed, 169 insertions(+), 272 deletions(-)
>
> The changes mostly looked OK to me (and the diffstat is certainly
> pleasing). The old callbacks returned a "consumed" length, and we need
> each "step" caller to now do "format += consumed" themselves. At first
> glance I thought there were cases where you didn't, but then I realized
> that you are relying on skip_prefix() to do that incrementing. Which
> makes sense and the resulting code looks nice, but it took me a minute
> to realize what was going on.

*nod*  The "returns consumed length" style is still used by
strbuf_expand_literal, though, so we have a bit of a mix.  Which
works, but a uniform convention might be better.

>> @@ -1894,7 +1880,26 @@ void userformat_find_requirements(const char *fmt, struct userformat_want *w)
>>  			return;
>>  		fmt = user_format;
>>  	}
>> -	strbuf_expand(&dummy, fmt, userformat_want_item, w);
>> +	while (strbuf_expand_step(&dummy, &fmt)) {
>> +		if (skip_prefix(fmt, "%", &fmt))
>> +			continue;
>> +
>> +		if (*fmt == '+' || *fmt == '-' || *fmt == ' ')
>> +			fmt++;
>> +
>> +		switch (*fmt) {
>> +		case 'N':
>> +			w->notes = 1;
>> +			break;
>> +		case 'S':
>> +			w->source = 1;
>> +			break;
>> +		case 'd':
>> +		case 'D':
>> +			w->decorate = 1;
>> +			break;
>> +		}
>> +	}
>>  	strbuf_release(&dummy);
>>  }
>
> This one actually doesn't increment the format (so we restart the
> expansion on "N" or whatever). But neither did the original! It always
> returned 0:
>
>> -static size_t userformat_want_item(struct strbuf *sb UNUSED,
>> -				   const char *placeholder,
>> -				   void *context)
>> -{
>> -	struct userformat_want *w = context;
>> -
>> -	if (*placeholder == '+' || *placeholder == '-' || *placeholder == ' ')
>> -		placeholder++;
>> -
>> -	switch (*placeholder) {
>> -	case 'N':
>> -		w->notes = 1;
>> -		break;
>> -	case 'S':
>> -		w->source = 1;
>> -		break;
>> -	case 'd':
>> -	case 'D':
>> -		w->decorate = 1;
>> -		break;
>> -	}
>> -	return 0;
>> -}
>
> So probably OK, though a little funny.
>
> It also feels like this whole function would be just as happy using
> "strchr()", since it throws away the expanded result anyway. But that
> can be for another time. :)

Good idea!  And the conversion to a loop brings us halfway there.

>
>> @@ -1912,7 +1917,16 @@ void repo_format_commit_message(struct repository *r,
>>  	const char *output_enc = pretty_ctx->output_encoding;
>>  	const char *utf8 = "UTF-8";
>>
>> -	strbuf_expand(sb, format, format_commit_item, &context);
>> +	while (strbuf_expand_step(sb, &format)) {
>> +		size_t len;
>> +
>> +		if (skip_prefix(format, "%", &format))
>> +			strbuf_addch(sb, '%');
>> +		else if ((len = format_commit_item(sb, format, &context)))
>> +			format += len;
>> +		else
>> +			strbuf_addch(sb, '%');
>> +	}
>
> This one doesn't advance the format for a not-understood placeholder.
> But that's OK, because we know it isn't "%", so starting the search from
> there again is correct.

Right.  This is copied from strbuf_expand.

>
>> @@ -1842,7 +1852,7 @@ static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
>>  	}
>>
>>  	orig_len = sb->len;
>> -	if (((struct format_commit_context *)context)->flush_type != no_flush)
>> +	if ((context)->flush_type != no_flush)
>>  		consumed = format_and_pad_commit(sb, placeholder, context);
>>  	else
>>  		consumed = format_commit_one(sb, placeholder, context);
>
> Since we're no longer casting, the extra parentheses seem redundant now.
> I.e., this can just be context->flush_type.

Indeed.

René