Re: [PATCH v9 0/5] unpack large blobs in stream

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 02 Feb 2022 11:59:47 +0100

On Wed, Feb 02 2022, Han Xin wrote:

> On Wed, Feb 2, 2022 at 5:28 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>>
>>
>> On Thu, Jan 20 2022, Han Xin wrote:
>>
>> > From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx>
>> >
>> > Changes since v8:
>> > * Rename "assert_no_loose ()" into "test_no_loose ()" in
>> >   "t5329-unpack-large-objects.sh". Remove "assert_no_pack ()" and use
>> >   "test_dir_is_empty" instead.
>> >
>> > * Revert changes to "create_tmpfile()" and error handling is now in
>> >   "start_loose_object_common()".
>> >
>> > * Remove "finalize_object_file_with_mtime()" which seems to be an overkill
>> >   for "write_loose_object()" now.
>> >
>> > * Remove the commit "object-file.c: remove the slash for directory_size()",
>> >   it can be in a separate patch if necessary.
>> >
>> > Han Xin (4):
>> >   unpack-objects: low memory footprint for get_data() in dry_run mode
>> >   object-file.c: refactor write_loose_object() to several steps
>> >   object-file.c: add "stream_loose_object()" to handle large object
>> >   unpack-objects: unpack_non_delta_entry() read data in a stream
>> >
>> > Ævar Arnfjörð Bjarmason (1):
>> >   object-file API: add a format_object_header() function
>>
>> I sent
>> https://lore.kernel.org/git/cover-00.10-00000000000-20220201T144803Z-avarab@xxxxxxxxx/
>> today which suggests splitting out the 5/5 cleanup you'd integrated.
>>
>> I then rebased these patches of yours on top of that, the result is
>> here:
>> https://github.com/avar/git/tree/han-xin-avar/unpack-loose-object-streaming-9
>>
>> The range-diff to your version is below. There's a few unrelated
>> fixes/nits in it.
>>
>> I think with/without basing this on top of my series above your patches
>> here look good with the nits pointed out in the diff below addressed
>> (and some don't need to be). I.e. the dependency on it is rather
>> trivial, and the two could be split up.
>>
>> What do you think is a good way to proceed? I could just submit the
>> below as a proposed v10 if you'd like & agree...
>>
>
> Yes, thanks for the suggestions, and I'm glad you're happy to do so.

Willdo.

>> 1:  553a9377eb3 ! 1:  61fcfe7b840 unpack-objects: low memory footprint for get_data() in dry_run mode
>>     @@ Commit message
>>          unpack-objects: low memory footprint for get_data() in dry_run mode
>>
>>          As the name implies, "get_data(size)" will allocate and return a given
>>     -    size of memory. Allocating memory for a large blob object may cause the
>>     +    amount of memory. Allocating memory for a large blob object may cause the
>>          system to run out of memory. Before preparing to replace calling of
>>          "get_data()" to unpack large blob objects in latter commits, refactor
>>          "get_data()" to reduce memory footprint for dry_run mode.
>>     @@ Commit message
>>
>>          Suggested-by: Jiang Xin <zhiyou.jx@xxxxxxxxxxxxxxx>
>>          Signed-off-by: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx>
>>     +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>>
>>       ## builtin/unpack-objects.c ##
>>      @@ builtin/unpack-objects.c: static void use(int bytes)
>>     @@ t/t5328-unpack-large-objects.sh (new)
>>      +
>>      +test_no_loose () {
>>      +  glob=dest.git/objects/?? &&
>>     -+  echo "$glob" >expect &&
>>     -+  eval "echo $glob" >actual &&
>>     ++  echo $glob >expect &&
>>     ++  echo "$glob" >actual &&
>>      +  test_cmp expect actual
>>      +}
>>      +
>
> I have a small doubt with this, it works fine with dash, but not
> others like zsh. Wouldn't
> it be better to do compatibility, or would it introduce other issues
> that I don't know?

Ah, I hadn't spotted that zsh issue. I don't think the test suite will
run on it in general, but in any case I'll fix this.

There's a few other tests that do this just by piping "find" to "wc -l",
it's probably better to just follow that pattern. I think the eval
works, but I thought it was a bit unusual/stood out.