Re: [PATCH v13 5/7] object-file.c: add "stream_loose_object()" to handle large object

Han Xin <chiyutianyi@xxxxxxxxx> · Thu, 9 Jun 2022 14:04:22 +0800

On Tue, Jun 7, 2022 at 4:03 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
> > I am very tempted to ask why we do not do this to _all_ loose object
> > files.  Instead of running the machinery twice over the data (once to
> > compute the object name, then to compute the contents and write out),
> > if we can produce loose object files of any size with a single pass,
> > wouldn't that be an overall win?
>
> There is a patch later in the series whose proposed log message has
> benchmarks to show that it is slower in general.  It still is
> curious where the slowness comes from and if it is something we can
> tune, though.
>

Compared with getting the whole object buffer, stream_loose_object() uses
limited avail_in buffer and never fill new content until the whole
avail_in has been
deflated. It will generate small avail_in fragments due to the limited
avail_out,
and I think it is precisely because these avail_in fragments generate additional
git_deflate() loops.

In "unpack-objects", we use a buffer size of 8192. Increasing the buffer
can alleviate this problem, but maybe it's not worth it?

> Thanks.