[PATCH 0/4] submodule: parallelize status

Calvin Wan <calvinwan@xxxxxxxxxx> · Thu, 22 Sep 2022 23:29:43 +0000

When running 'git status' in a superproject, git spawns a subprocess in
series to run status for every submodule. For projects with many large
submodules, parallelizing status subprocesses can significantly speed
up the runtime for both cold and warm caches.

Here are some timing tests from running status on the Android Open
Source Project (AOSP). My machine has an SSD and 48 cores. 
  Warm Cache:
    git 2.37
Time (mean ± σ):     17.685 s ±  2.040 s    [User: 5.041 s, System: 22.799 s]
Range (min … max):   16.168 s … 22.804 s    10 runs

    this patch (status.parallelSubmodules=1)
Time (mean ± σ):     13.102 s ±  0.500 s    [User: 4.894 s, System: 19.533 s]
Range (min … max):   12.841 s … 14.447 s    10 runs

    this patch (status.parallelSubmodules=5)
Time (mean ± σ):      3.994 s ±  0.152 s    [User: 4.998 s, System: 20.805 s]
Range (min … max):    3.744 s …  4.163 s    10 runs

    this patch (status.parallelSubmodules=10)
Time (mean ± σ):      3.445 s ±  0.085 s    [User: 5.151 s, System: 20.208 s]
Range (min … max):    3.319 s …  3.586 s    10 runs

    this patch (status.parallelSubmodules=20)
Time (mean ± σ):      3.626 s ±  0.109 s    [User: 5.087 s, System: 20.366 s]
Range (min … max):    3.438 s …  3.763 s    10 runs

We can see that there are diminishing returns and even slightly worse
performance after a certain number of max processes, but optimally
there is a speed up factor of around 5.

  Cold Cache:
    git 2.37
      mean of 3 runs: 6m32s
    this patch (status.parallelSubmodules=1)
      mean of 3 runs: 5m34s
    this patch (status.parallelSubmodules=5)
      mean of 3 runs: 2m23s
    this patch (status.parallelSubmodules=10)
      mean of 3 runs: 2m45s
    this patch (status.parallelSubmodules=20)
      mean of 3 runs: 3m23s

We can witness the same phenomenon as above and optimally there is a
speed up factor of around 2.7.

Patch 1 adds output piping to run_processes_parallel so the output
from each submodule can be parsed. Patches 2 and 3 move preexisting
functionality into separate functions and refactor code to prepare
for patch 4 to implement parallelization.

Future work: The reason why status is much slower on a cold cache vs
warm cache is mainly due to refreshing the index. It is worth
investigating whether this is entirely necessary.

Calvin Wan (4):
  run-command: add pipe_output to run_processes_parallel
  submodule: move status parsing into function
  diff-lib: refactor functions
  diff-lib: parallelize run_diff_files for submodules

 Documentation/config/status.txt |   6 +
 add-interactive.c               |   2 +-
 builtin/add.c                   |   4 +-
 builtin/commit.c                |   6 +
 builtin/diff-files.c            |   2 +-
 builtin/diff.c                  |   2 +-
 builtin/merge.c                 |   2 +-
 builtin/stash.c                 |   2 +-
 builtin/submodule--helper.c     |   4 +-
 diff-lib.c                      | 120 +++++++++++++-----
 diff.h                          |   2 +-
 run-command.c                   |   6 +-
 run-command.h                   |   9 ++
 submodule.c                     | 213 +++++++++++++++++++++++++++-----
 submodule.h                     |   9 ++
 t/helper/test-run-command.c     |  31 ++++-
 t/t0061-run-command.sh          |  26 ++++
 wt-status.c                     |   6 +-
 wt-status.h                     |   1 +
 19 files changed, 372 insertions(+), 81 deletions(-)

base-commit: 5502f77b6944eda8e26813d8f542cffe7d110aea
-- 
2.37.3.998.g577e59143f-goog