Re: [PATCH 2/2] fast-import: duplicate into history rather than passing ownership

Mike Hommey <mh@xxxxxxxxxxxx> · Sun, 25 Aug 2019 19:02:13 +0900

On Sun, Aug 25, 2019 at 04:10:55AM -0400, Jeff King wrote:
> Fast-import's read_next_command() has somewhat odd memory ownership
> semantics for the command_buf strbuf. After reading a command, we copy
> the strbuf's pointer (without duplicating the string) into our cmd_hist
> array of recent commands. And then when we're about to read a new
> command, we clear the strbuf by calling strbuf_detach(), dropping
> ownership from the strbuf (leaving the cmd_hist reference as the
> remaining owner).
> 
> This has a few surprising implications:
> 
>   - if the strbuf hasn't been copied into cmd_hist (e.g., because we
>     haven't ready any commands yet), then the strbuf_detach() will leak
>     the resulting string
> 
>   - any modification to command_buf risks invalidating the pointer held
>     by cmd_hist. There doesn't seem to be any way to trigger this
>     currently (since we tend to modify it only by detaching and reading
>     in a new value), but it's subtly dangerous.
> 
>   - any pointers into an input string will remain valid as long as
>     cmd_hist points to them. So in general, you can point into
>     command_buf.buf and call read_next_command() up to 100 times before
>     your string is cycled out and freed, leaving you with a dangling
>     pointer. This makes it easy to miss bugs during testing, as they
>     might trigger only for a sufficiently large commit (e.g., the bug
>     fixed in the previous commit).
> 
> Instead, let's make a new string to copy the command into the history
> array, rather than having dual ownership with the old. Then we can drop
> the strbuf_detach() calls entirely, and just reuse the same buffer
> within command_buf over and over. We'd normally have to strbuf_reset()
> it before using it again, but in both cases here we're using
> strbuf_getline(), which does it automatically for us.
> 
> This fixes the leak, and it means that even a single call to
> read_next_command() will invalidate any held pointers, making it easier
> to find bugs. In fact, we can drop the extra input lines added to the
> test case by the previous commit, as the unfixed bug would now trigger
> just from reading the commit message, even without any modified files in
> the commit.
> 
> Reported-by: Mike Hommey <mh@xxxxxxxxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  fast-import.c          | 4 +---
>  t/t9300-fast-import.sh | 5 -----
>  2 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/fast-import.c b/fast-import.c
> index ee7258037a..1f9160b645 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -1763,7 +1763,6 @@ static int read_next_command(void)
>  		} else {
>  			struct recent_command *rc;
>  
> -			strbuf_detach(&command_buf, NULL);
>  			stdin_eof = strbuf_getline_lf(&command_buf, stdin);
>  			if (stdin_eof)
>  				return EOF;
> @@ -1784,7 +1783,7 @@ static int read_next_command(void)
>  				free(rc->buf);
>  			}
>  
> -			rc->buf = command_buf.buf;
> +			rc->buf = xstrdup(command_buf.buf);

You could xstrndup(command_buf.buf, command_buf.len), which would avoid
a hidden strlen.

Mike