On Tue, Jun 7, 2022 at 4:03 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Junio C Hamano <gitster@xxxxxxxxx> writes: > > > I am very tempted to ask why we do not do this to _all_ loose object > > files. Instead of running the machinery twice over the data (once to > > compute the object name, then to compute the contents and write out), > > if we can produce loose object files of any size with a single pass, > > wouldn't that be an overall win? > > There is a patch later in the series whose proposed log message has > benchmarks to show that it is slower in general. It still is > curious where the slowness comes from and if it is something we can > tune, though. > Compared with getting the whole object buffer, stream_loose_object() uses limited avail_in buffer and never fill new content until the whole avail_in has been deflated. It will generate small avail_in fragments due to the limited avail_out, and I think it is precisely because these avail_in fragments generate additional git_deflate() loops. In "unpack-objects", we use a buffer size of 8192. Increasing the buffer can alleviate this problem, but maybe it's not worth it? > Thanks.