From: Joey Hess <id@xxxxxxxxxx> We write the output of a "clean" filter into a strbuf. Rather than growing the strbuf dynamically as we read its output, we make the initial allocation as large as the original input file. This is a good guess when the filter is just tweaking a few bytes, but it's disastrous when the point of the filter is to condense a very large file into a short identifier (e.g., the way git-lfs and git-annex do). We may ask to allocate many gigabytes, causing the allocation to fail and Git to die(). Instead, let's just let strbuf do its usual growth. When the clean filter does output something around the same size as the worktree file, the buffer will need to be reallocated until it fits, starting at 8192 and doubling in size. Benchmarking indicates that reallocation is not a significant overhead for outputs up to a few MB in size. Signed-off-by: Joey Hess <id@xxxxxxxxxx> Signed-off-by: Jeff King <peff@xxxxxxxx> --- This is a resurrection of the patch from: https://public-inbox.org/git/20190122220714.GA6176@xxxxxxxxxxx/ It got stalled on discussion of the commit message, which I've rewritten here to match the suggestions in the thread. As discussed there, I do think this only solves half the problem, as the smudge filter has the same issue in reverse. That's more complicated to fix, and AFAIK nobody is working on it. But I don't think there's any reason not to pick up this part in the meantime. convert.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/convert.c b/convert.c index 5d0307fc10..94ff837649 100644 --- a/convert.c +++ b/convert.c @@ -731,7 +731,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le if (start_async(&async)) return 0; /* error was already reported */ - if (strbuf_read(&nbuf, async.out, len) < 0) { + if (strbuf_read(&nbuf, async.out, 0) < 0) { err = error(_("read from external filter '%s' failed"), cmd); } if (close(async.out)) { -- 2.21.0.787.g929e938557