Re: [PATCH v2] strbuf: add and use strbuf_insertstr()

René Scharfe <l.s.r@xxxxxx> · Tue, 11 Feb 2020 17:18:02 +0100

Am 11.02.20 um 00:44 schrieb Jeff King:
> On Sun, Feb 09, 2020 at 12:36:22PM -0500, Eric Sunshine wrote:
>
>> On Sun, Feb 9, 2020 at 8:45 AM René Scharfe <l.s.r@xxxxxx> wrote:
>>> Add a function for inserting a C string into a strbuf.  Use it
>>> throughout the source to get rid of magic string length constants and
>>> explicit strlen() calls.
>>>
>>> Like strbuf_addstr(), implement it as an inline function to avoid the
>>> implicit strlen() calls to cause runtime overhead.
>>>
>>> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
>>> ---
>>> diff --git a/mailinfo.c b/mailinfo.c
>>> @@ -570,7 +570,7 @@ static int check_header(struct mailinfo *mi,
>>>                 len = strlen("Content-Type: ");
>>>                 strbuf_add(&sb, line->buf + len, line->len - len);
>>>                 decode_header(mi, &sb);
>>> -               strbuf_insert(&sb, 0, "Content-Type: ", len);
>>> +               strbuf_insertstr(&sb, 0, "Content-Type: ");
>>>                 handle_content_type(mi, &sb);
>>
>> Meh. We've already computed the length of "Content-Type: " a few lines
>> earlier, so taking advantage of that value when inserting the string
>> literal is perfectly sensible. Thus, I'm not convinced that this
>> change is an improvement.
>
> I had a similar thought. I kind of wonder if all of these "len" bits
> (and their repeated strings) could go away if cmp_header() just used
> skip_iprefix() under the hood and had a pointer out-parameter.
>
> Something like the (largely untested) patch below. That would also make
> it easy to support arbitrary amounts of whitespace after the header,
> which I think are allowed by the standard (whereas now we'd parse
> "Content-type:    text/plain" as "    text/plain", which is silly).
>
> Worth doing?

Sure.

> ---
> diff --git a/mailinfo.c b/mailinfo.c
> index b395adbdf2..bbb5b429f8 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -346,11 +346,16 @@ static const char *header[MAX_HDR_PARSED] = {
>  	"From","Subject","Date",
>  };
>
> -static inline int cmp_header(const struct strbuf *line, const char *hdr)
> +static inline int cmp_header(const struct strbuf *line, const char *hdr,
> +			     const char **outval)
>  {
> -	int len = strlen(hdr);
> -	return !strncasecmp(line->buf, hdr, len) && line->len > len &&
> -			line->buf[len] == ':' && isspace(line->buf[len + 1]);
> +	const char *val;
> +	if (!skip_iprefix(line->buf, hdr, &val) ||
> +	    *val++ != ':' ||
> +	    !isspace(*val++))
> +		return 0;
> +	*outval = val;
> +	return 1;
>  }

And you could rename it to skip_header() to fix the issue that its name
starts with cmp but its return value is the inverse of a cmp-style
function.

And it could take a char pointer instead of a strbuf, to reduce its
dependencies and make it more widely useful -- but that might also be
a case of YAGNI.

>
>  static int is_format_patch_separator(const char *line, int len)
> @@ -547,17 +552,17 @@ static int check_header(struct mailinfo *mi,
>  			const struct strbuf *line,
>  			struct strbuf *hdr_data[], int overwrite)
>  {
> -	int i, ret = 0, len;
> +	int i, ret = 0;
>  	struct strbuf sb = STRBUF_INIT;
> +	const char *val;
>
>  	/* search for the interesting parts */
>  	for (i = 0; header[i]; i++) {
> -		int len = strlen(header[i]);
> -		if ((!hdr_data[i] || overwrite) && cmp_header(line, header[i])) {
> +		if ((!hdr_data[i] || overwrite) && cmp_header(line, header[i], &val)) {
>  			/* Unwrap inline B and Q encoding, and optionally
>  			 * normalize the meta information to utf8.
>  			 */
> -			strbuf_add(&sb, line->buf + len + 2, line->len - len - 2);
> +			strbuf_addstr(&sb, val);

That assumes the header value never contains NULs.  Valid?

>  			decode_header(mi, &sb);
>  			handle_header(&hdr_data[i], &sb);
>  			ret = 1;
> @@ -566,26 +571,22 @@ static int check_header(struct mailinfo *mi,
>  	}
>
>  	/* Content stuff */
> -	if (cmp_header(line, "Content-Type")) {
> -		len = strlen("Content-Type: ");
> -		strbuf_add(&sb, line->buf + len, line->len - len);
> +	if (cmp_header(line, "Content-Type", &val)) {
> +		strbuf_addstr(&sb, val);
>  		decode_header(mi, &sb);
> -		strbuf_insert(&sb, 0, "Content-Type: ", len);
>  		handle_content_type(mi, &sb);
>  		ret = 1;
>  		goto check_header_out;
>  	}
> -	if (cmp_header(line, "Content-Transfer-Encoding")) {
> -		len = strlen("Content-Transfer-Encoding: ");
> -		strbuf_add(&sb, line->buf + len, line->len - len);
> +	if (cmp_header(line, "Content-Transfer-Encoding", &val)) {
> +		strbuf_addstr(&sb, val);
>  		decode_header(mi, &sb);
>  		handle_content_transfer_encoding(mi, &sb);
>  		ret = 1;
>  		goto check_header_out;
>  	}
> -	if (cmp_header(line, "Message-Id")) {
> -		len = strlen("Message-Id: ");
> -		strbuf_add(&sb, line->buf + len, line->len - len);
> +	if (cmp_header(line, "Message-Id", &val)) {
> +		strbuf_addstr(&sb, val);
>  		decode_header(mi, &sb);
>  		if (mi->add_message_id)
>  			mi->message_id = strbuf_detach(&sb, NULL);

The repeated sequence cmp_header()+strbuf_add{,str}()+decode_header()
makes me itchy.

> @@ -607,8 +608,9 @@ static int is_inbody_header(const struct mailinfo *mi,
>  			    const struct strbuf *line)
>  {
>  	int i;
> +	const char *val;
>  	for (i = 0; header[i]; i++)
> -		if (!mi->s_hdr_data[i] && cmp_header(line, header[i]))
> +		if (!mi->s_hdr_data[i] && cmp_header(line, header[i], &val))
>  			return 1;
>  	return 0;
>  }
>