Re: [PATCH] avoid unncessary malloc of whole file size

Jeff King <peff@xxxxxxxx> · Thu, 24 Jan 2019 16:18:44 -0500

On Thu, Jan 24, 2019 at 01:12:15PM -0800, Junio C Hamano wrote:

> Joey Hess <id@xxxxxxxxxx> writes:
> 
> > When a worktree file is larger than the available memory, and a clean
> > filter is in use, this avoids mallocing a buffer the whole size of the
> > file when reading from the clean filter, which caused commands like git
> > status and git commit to OOM.
> >
> > Often in this situation the clean filter will produce a short identifier
> > for the file, so such a large buffer is not needed.
> >
> > When the clean filter does output something around the same size as the
> > worktree file, the buffer will need to be reallocated until it fits,
> > starting at 8192 and doubling in size. Benchmarking indicates that
> > reallocation is not a significant overhead for outputs up to a
> > few MB in size.
> 
> Problem description first, then solultion.  "... this avoids ..." is
> already talking about solution while forcing the readers to know
> what the problem is.
> 
>     When a worktree file is ... filter is in use, we allocate a
>     buffer for the whole size of the file when reading from the
>     clean filter.  This can force us to overallocate if the clean
>     filter is used to radically shrink a huge file and replace it
>     with a small token (e.g. git-annex or git-lfs) and lead to OOM
>     at the worst case.  Reading from the filter and growing the
>     buffer as we go would avoid such an unnecessary OOM.
> 
>     When the clean filter does output ...
>     ... few MB in size.
> 
> perhaps.

Yeah, I agree that organization is nicer. Other than that, the patch
looks good to me.

-Peff