Re: [RFC PATCH 7/7] update-index: make use of HASH_N_OBJECTS{,_{FIRST,LAST}} flags

Neeraj Singh <nksingh85@xxxxxxxxx> · Tue, 22 Mar 2022 22:51:35 -0700

On Tue, Mar 22, 2022 at 8:48 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> As with unpack-objects in a preceding commit have update-index.c make
> use of the HASH_N_OBJECTS{,_{FIRST,LAST}} flags. We now have a "batch"
> mode again for "update-index".
>
> Adding the t/* directory from git.git on a Linux ramdisk is a bit
> faster than with the tmp-objdir indirection:
>
>         git hyperfine -L rev ns/batched-fsync,HEAD -s 'make CFLAGS=-O3' -p 'rm -rf repo && git init repo && cp -R t repo/' 'git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' --warmup 1 -r 10
>         Benchmark 1: git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'ns/batched-fsync
>           Time (mean ± σ):     289.8 ms ±   4.0 ms    [User: 186.3 ms, System: 103.2 ms]
>           Range (min … max):   285.6 ms … 297.0 ms    10 runs
>
>         Benchmark 2: git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'HEAD
>           Time (mean ± σ):     273.9 ms ±   7.3 ms    [User: 189.3 ms, System: 84.1 ms]
>           Range (min … max):   267.8 ms … 291.3 ms    10 runs
>
>         Summary
>           'git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'HEAD' ran
>             1.06 ± 0.03 times faster than 'git ls-files -- t | ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'ns/batched-fsync'
>
> And as before running that with "strace --summary-only" slows things
> down a bit (probably mimicking slower I/O a bit). I then get:
>
>         Summary
>           'git ls-files -- t | strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'HEAD' ran
>             1.21 ± 0.02 times faster than 'git ls-files -- t | strace --summary-only ./git -c core.fsync=loose-object -c core.fsyncMethod=batch -C repo update-index --add --stdin' in 'ns/batched-fsync'
>
> We also go from ~51k syscalls to ~39k, with ~2x the number of link()
> and unlink() in ns/batched-fsync.
>
> In the process of doing this conversion we lost the "bulk" mode for
> files added on the command-line. I don't think it's useful to optimize
> that, but we could if anyone cared.
>
> We've also converted this to a string_list, we could walk with
> getline_fn() and get one line "ahead" to see what we have left, but I
> found that state machine a bit painful, and at least in my testing
> buffering this doesn't harm things. But we could also change this to
> stream again, at the cost of some getline_fn() twiddling.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
> ---
>  builtin/update-index.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index af02ff39756..c7cbfe1123b 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -1194,15 +1194,38 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>         }
>
>         if (read_from_stdin) {
> +               struct string_list list = STRING_LIST_INIT_NODUP;
>                 struct strbuf line = STRBUF_INIT;
>                 struct strbuf unquoted = STRBUF_INIT;
> +               size_t i, nr;
> +               unsigned oflags;
>
>                 setup_work_tree();
> -               while (getline_fn(&line, stdin) != EOF)
> -                       line_from_stdin(&line, &unquoted, prefix, prefix_length,
> -                                       nul_term_line, set_executable_bit, 0);
> +               while (getline_fn(&line, stdin) != EOF) {
> +                       size_t len = line.len;
> +                       char *str = strbuf_detach(&line, NULL);
> +
> +                       string_list_append_nodup(&list, str)->util = (void *)len;
> +               }
> +
> +               nr = list.nr;
> +               oflags = nr > 1 ? HASH_N_OBJECTS : 0;
> +               for (i = 0; i < nr; i++) {
> +                       size_t nth = i + 1;
> +                       unsigned f = i == 0 ? HASH_N_OBJECTS_FIRST :
> +                                 nr == nth ? HASH_N_OBJECTS_LAST : 0;
> +                       struct strbuf buf = STRBUF_INIT;
> +                       struct string_list_item *item = list.items + i;
> +                       const size_t len = (size_t)item->util;
> +
> +                       strbuf_attach(&buf, item->string, len, len);
> +                       line_from_stdin(&buf, &unquoted, prefix, prefix_length,
> +                                       nul_term_line, set_executable_bit,
> +                                       oflags | f);
> +                       strbuf_release(&buf);
> +               }
>                 strbuf_release(&unquoted);
> -               strbuf_release(&line);
> +               string_list_clear(&list, 0);
>         }
>
>         if (split_index > 0) {
> --
> 2.35.1.1428.g1c1a0152d61
>

This buffering introduces the same potential risk of the
"stdin-feeder" process not being able to see objects right away as my
version had. I'm planning to mitigate the issue by unplugging the bulk
checkin when issuing a verbose report so that anyone who's using that
output to synchronize can still see what they're expecting.

I think the code you've presented here is a lot of diff to accomplish
the same thing that my series does, where this specific update-index
caller has been roto-tilled to provide the needed
begin/end-transaction points.  And I think there will be a lot of
complexity in supporting the same hints for command-line additions
(which is roughly equivalent to the git-add workflow). Every caller
that wants batch treatment will have to either implement a state
machine or implement a buffering mechanism in order to figure out the
begin-end points. Having a separate plug/unplug call eliminates this
complexity on each caller.

Btw, I'm planning in a future series to reduce the system calls
involved in renaming a file by taking advantage of the renameat2
system call and equivalents on other platforms.  There's a pretty
strong motivation to do that on Windows.

Thanks for the concrete code,
-Neeraj