Re: [PATCH v4 4/5] fast-export: do not modify memory from get_commit_buffer

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 03 May 2021 13:41:00 +0900

Luke Shumaker <lukeshu@xxxxxxxxxxx> writes:

> +static char *reencode_message(const char *in_msg,
> +			      const char *in_encoding, size_t in_encoding_len)
> +{
> +	static struct strbuf in_encoding_buf = STRBUF_INIT;
> +
> +	strbuf_reset(&in_encoding_buf);
> +	strbuf_add(&in_encoding_buf, in_encoding, in_encoding_len);
> +
> +	return reencode_string(in_msg, "UTF-8", in_encoding_buf.buf);
> +}

There is only a single caller of this, so making it caller's
responsibility to do the strbuf thing would allow us to make this
thread-safe quite easily (and at that point we might not even have
this helper function).

> +	committer = strstr(commit_buffer_cursor, "\ncommitter ");
>  	if (!committer)
>  		die("could not find committer in commit %s",
>  		    oid_to_hex(&commit->object.oid));
>  	committer++;
> -	committer_end = strchrnul(committer, '\n');
> -	message = strstr(committer_end, "\n\n");
> -	encoding = find_encoding(committer_end, message);
> +	commit_buffer_cursor = committer_end = strchrnul(committer, '\n');
> +
> +	/* find_commit_header() gets a `+ 1` because
> +	 * commit_buffer_cursor points at the trailing "\n" at the end
> +	 * of the previous line, but find_commit_header() wants a
> +	 * pointer to the beginning of the next line. */
> +	encoding = find_commit_header(commit_buffer_cursor + 1, "encoding", &encoding_len);

	/*
	 * Our multi-line comments have opening and closing
	 * slash-asterisk and asterisk-slash on their own
	 * lines.
	 */

What if strchrnul() returned a pointer to the terminating NUL
instead of the LF at the end of the line?  +1 will run past the end
of the buffer.

> +	if (encoding)
> +		commit_buffer_cursor = encoding + encoding_len;
> +
> +	message = strstr(commit_buffer_cursor, "\n\n");

Good.

> @@ -685,14 +693,15 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
>  	} else if (encoding) {
>  		switch(reencode_mode) {
>  		case REENCODE_YES:
> -			reencoded = reencode_string(message, "UTF-8", encoding);
> +			reencoded = reencode_message(message, encoding, encoding_len);
>  			break;

Here is where we can do the temporary strbuf to hold encoding[0,
encoding_len] and directly call reencode_string().

Other than that, this step looks good to me.

Thanks.