Re: [PATCH] avoid unncessary malloc of whole file size

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 24 Jan 2019 13:12:15 -0800

Joey Hess <id@xxxxxxxxxx> writes:

> When a worktree file is larger than the available memory, and a clean
> filter is in use, this avoids mallocing a buffer the whole size of the
> file when reading from the clean filter, which caused commands like git
> status and git commit to OOM.
>
> Often in this situation the clean filter will produce a short identifier
> for the file, so such a large buffer is not needed.
>
> When the clean filter does output something around the same size as the
> worktree file, the buffer will need to be reallocated until it fits,
> starting at 8192 and doubling in size. Benchmarking indicates that
> reallocation is not a significant overhead for outputs up to a
> few MB in size.

Problem description first, then solultion.  "... this avoids ..." is
already talking about solution while forcing the readers to know
what the problem is.

    When a worktree file is ... filter is in use, we allocate a
    buffer for the whole size of the file when reading from the
    clean filter.  This can force us to overallocate if the clean
    filter is used to radically shrink a huge file and replace it
    with a small token (e.g. git-annex or git-lfs) and lead to OOM
    at the worst case.  Reading from the filter and growing the
    buffer as we go would avoid such an unnecessary OOM.

    When the clean filter does output ...
    ... few MB in size.

perhaps.

> Signed-off-by: Joey Hess <id@xxxxxxxxxx>
> ---
>  convert.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/convert.c b/convert.c
> index 0d89ae7c23..85aebe2ed3 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -732,7 +732,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
>  	if (start_async(&async))
>  		return 0;	/* error was already reported */
>  
> -	if (strbuf_read(&nbuf, async.out, len) < 0) {
> +	if (strbuf_read(&nbuf, async.out, 0) < 0) {
>  		err = error(_("read from external filter '%s' failed"), cmd);
>  	}
>  	if (close(async.out)) {