Re: [RFC PATCH 7/7] update-index: make use of HASH_N_OBJECTS{,_{FIRST,LAST}} flags

Neeraj Singh <nksingh85@xxxxxxxxx> · Wed, 23 Mar 2022 13:19:32 -0700

I'm going to respond in more detail to your individual patches,
(expect the last mail to contain a comment at the end "LAST MAIL").

On Wed, Mar 23, 2022 at 3:52 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
>
> On Tue, Mar 22 2022, Neeraj Singh wrote:
>
> > On Tue, Mar 22, 2022 at 8:48 PM Ævar Arnfjörð Bjarmason
> > <avarab@xxxxxxxxx> wrote:
> >>
> >> As with unpack-objects in a preceding commit have update-index.c make
> >> use of the HASH_N_OBJECTS{,_{FIRST,LAST}} flags. We now have a "batch"
> >> mode again for "update-index".
> >>
> >> Adding the t/* directory from git.git on a Linux ramdisk is a bit
> >> faster than with the tmp-objdir indirection:
> >>
> >>         git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3' -p 'rm -rf repo && git init repo && cp -R t repo/' 'git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' --warmup 1 -r 10
> >>         Benchmark 1: git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'ns/batched-fsync
> >>           Time (mean ± σ):     289.8 ms ±   4.0 ms    [User: 186.3 ms, System: 103.2 ms]
> >>           Range (min … max):   285.6 ms … 297.0 ms    10 runs
> >>
> >>         Benchmark 2: git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'HEAD
> >>           Time (mean ± σ):     273.9 ms ±   7.3 ms    [User: 189.3 ms, System: 84.1 ms]
> >>           Range (min … max):   267.8 ms … 291.3 ms    10 runs
> >>
> >>         Summary
> >>           'git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'HEAD' ran
> >>             1.06 ± 0.03 times faster than 'git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'ns/batched-fsync'
> >>
> >> And as before running that with "strace --summary-only" slows things
> >> down a bit (probably mimicking slower I/O a bit). I then get:
> >>
> >>         Summary
> >>           'git ls-files -- t | strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'HEAD' ran
> >>             1.21 ± 0.02 times faster than 'git ls-files -- t | strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'ns/batched-fsync'
> >>
> >> We also go from ~51k syscalls to ~39k, with ~2x the number of link()
> >> and unlink() in ns/batched-fsync.
> >>
> >> In the process of doing this conversion we lost the "bulk" mode for
> >> files added on the command-line. I don't think it's useful to optimize
> >> that, but we could if anyone cared.
> >>
> >> We've also converted this to a string_list, we could walk with
> >> getline_fn() and get one line "ahead" to see what we have left, but I
> >> found that state machine a bit painful, and at least in my testing
> >> buffering this doesn't harm things. But we could also change this to
> >> stream again, at the cost of some getline_fn() twiddling.
> >>
> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
> >> ---
> >>  builtin/update-index.c | 31 +++++++++++++++++++++++++++----
> >>  1 file changed, 27 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/builtin/update-index.c b/builtin/update-index.c
> >> index af02ff39756..c7cbfe1123b 100644
> >> --- a/builtin/update-index.c
> >> +++ b/builtin/update-index.c
> >> @@ -1194,15 +1194,38 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
> >>         }
> >>
> >>         if (read_from_stdin) {
> >> +               struct string_list list = STRING_LIST_INIT_NODUP;
> >>                 struct strbuf line = STRBUF_INIT;
> >>                 struct strbuf unquoted = STRBUF_INIT;
> >> +               size_t i, nr;
> >> +               unsigned oflags;
> >>
> >>                 setup_work_tree();
> >> -               while (getline_fn(&line, stdin) != EOF)
> >> -                       line_from_stdin(&line, &unquoted, prefix, prefix_length,
> >> -                                       nul_term_line, set_executable_bit, 0);
> >> +               while (getline_fn(&line, stdin) != EOF) {
> >> +                       size_t len = line.len;
> >> +                       char *str = strbuf_detach(&line, NULL);
> >> +
> >> +                       string_list_append_nodup(&list, str)->util = (void *)len;
> >> +               }
> >> +
> >> +               nr = list.nr;
> >> +               oflags = nr > 1 ? HASH_N_OBJECTS : 0;
> >> +               for (i = 0; i < nr; i++) {
> >> +                       size_t nth = i + 1;
> >> +                       unsigned f = i == 0 ? HASH_N_OBJECTS_FIRST :
> >> +                                 nr == nth ? HASH_N_OBJECTS_LAST : 0;
> >> +                       struct strbuf buf = STRBUF_INIT;
> >> +                       struct string_list_item *item = list.items + i;
> >> +                       const size_t len = (size_t)item->util;
> >> +
> >> +                       strbuf_attach(&buf, item->string, len, len);
> >> +                       line_from_stdin(&buf, &unquoted, prefix, prefix_length,
> >> +                                       nul_term_line, set_executable_bit,
> >> +                                       oflags | f);
> >> +                       strbuf_release(&buf);
> >> +               }
> >>                 strbuf_release(&unquoted);
> >> -               strbuf_release(&line);
> >> +               string_list_clear(&list, 0);
> >>         }
> >>
> >>         if (split_index > 0) {
> >> --
> >> 2.35.1.1428.g1c1a0152d61
> >>
> >
> > This buffering introduces the same potential risk of the
> > "stdin-feeder" process not being able to see objects right away as my
> > version had. I'm planning to mitigate the issue by unplugging the bulk
> > checkin when issuing a verbose report so that anyone who's using that
> > output to synchronize can still see what they're expecting.
>
> I was rather terse in the commit message, I meant (but forgot some
> words) "doesn't harm thing for performance [in the above test]", but
> converting this to a string_list is clearly & regression that shouldn't
> be kept.
>
> I just wanted to demonstrate method of doing this by passing down the
> HASH_* flags, and found that writing the state-machine to "buffer ahead"
> by one line so that we can eventually know in the loop if we're in the
> "last" line or not was tedious, so I came up with this POC. But we
> clearly shouldn't lose the "streaming" aspect.
>