On Tue, Mar 29, 2022 at 03:56:10PM +0200, Ævar Arnfjörð Bjarmason wrote: > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx> > > If we want unpack and write a loose object using "write_loose_object", > we have to feed it with a buffer with the same size of the object, which > will consume lots of memory and may cause OOM. This can be improved by > feeding data to "stream_loose_object()" in a stream. > > Add a new function "stream_loose_object()", which is a stream version of > "write_loose_object()" but with a low memory footprint. We will use this > function to unpack large blob object in later commit. > Just a thought for optimization which you might want to try on top of this series: try using mmap on both the source and target files of your stream. Use a big 'window' for the mmap (multiple MB) to reduce the TLB flush costs. TLB flush costs should be minimal anyway if Git is single-threaded. If you can set the source and target buffers of zlib to the source and dest mappings respectively, you'd eliminate two copies of data into Git's stack buffers. You might need to over-allocate the dst file if you don't know the size up front, but doing an over-allocate and truncate should be pretty cheap if you're working with a big file. Thanks, Neeraj