On Wed, Dec 22, 2021 at 2:56 AM Han Xin <chiyutianyi@xxxxxxxxx> wrote: > > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx> > > We used to call "get_data()" in "unpack_non_delta_entry()" to read the > entire contents of a blob object, no matter how big it is. This > implementation may consume all the memory and cause OOM. > > By implementing a zstream version of input_stream interface, we can use > a small fixed buffer for "unpack_non_delta_entry()". > > However, unpack non-delta objects from a stream instead of from an > entrie buffer will have 10% performance penalty. Therefore, only unpack > object larger than the "core.BigFileStreamingThreshold" in zstream. See > the following benchmarks: > > hyperfine \ > --setup \ > 'if ! test -d scalar.git; then git clone --bare https://github.com/microsoft/scalar.git; cp scalar.git/objects/pack/*.pack small.pack; fi' \ > --prepare 'rm -rf dest.git && git init --bare dest.git' > > Summary > './git -C dest.git -c core.bigfilethreshold=512m unpack-objects <small.pack' in 'origin/master' > 1.01 ± 0.04 times faster than './git -C dest.git -c core.bigfilethreshold=512m unpack-objects <small.pack' in 'HEAD~1' > 1.01 ± 0.04 times faster than './git -C dest.git -c core.bigfilethreshold=512m unpack-objects <small.pack' in 'HEAD~0' > 1.03 ± 0.10 times faster than './git -C dest.git -c core.bigfilethreshold=16k unpack-objects <small.pack' in 'origin/master' > 1.02 ± 0.07 times faster than './git -C dest.git -c core.bigfilethreshold=16k unpack-objects <small.pack' in 'HEAD~0' > 1.10 ± 0.04 times faster than './git -C dest.git -c core.bigfilethreshold=16k unpack-objects <small.pack' in 'HEAD~1' > > Helped-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > Helped-by: Derrick Stolee <stolee@xxxxxxxxx> > Helped-by: Jiang Xin <zhiyou.jx@xxxxxxxxxxxxxxx> > Signed-off-by: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx> > --- > Documentation/config/core.txt | 11 +++++ > builtin/unpack-objects.c | 73 ++++++++++++++++++++++++++++- > cache.h | 1 + > config.c | 5 ++ > environment.c | 1 + > t/t5590-unpack-non-delta-objects.sh | 36 +++++++++++++- > 6 files changed, 125 insertions(+), 2 deletions(-) > > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt > index c04f62a54a..601b7a2418 100644 > --- a/Documentation/config/core.txt > +++ b/Documentation/config/core.txt > @@ -424,6 +424,17 @@ be delta compressed, but larger binary media files won't be. > + > Common unit suffixes of 'k', 'm', or 'g' are supported. > > +core.bigFileStreamingThreshold:: > + Files larger than this will be streamed out to a temporary > + object file while being hashed, which will when be renamed > + in-place to a loose object, particularly if the > + `core.bigFileThreshold' setting dictates that they're always > + written out as loose objects. Han Xin told me the reason to introduce another git config variable, but I feel it not good to introduce an application specific config variable as "core.XXX" and parsing it in "config.c". So in patch v8, will still reuse the config variable "core.bigFileThreshold", and will introduce an application specific config variable, such as unpack.bigFileThreshold and parse the new config in "builtin/unpack-objects.c". -- Jiang Xin