Re: [PATCH v2 0/4] submodule: parallelize diff

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 11 Oct 2022 22:52:42 -0700

Calvin Wan <calvinwan@xxxxxxxxxx> writes:

> I also wanted to pose another question to list regarding defaults for
> parallel processes. For jobs that clearly scale with the number of
> processes (aka jobs that are mostly processor bound), it is obvious that
> setting the default number of processes to the number of available cores
> is the most optimal option. However, this changes when the job is mostly
> I/O bound or has a combination of I/O and processing. Looking at my use
> case for `status` on a cold cache (see below), we notice that increasing
> the number of parallel processes speeds up status, but after a certain
> number, it actually starts slowing down.

I do not offhand recall how the default parallelism is computed
there, but if I am correct to suspect that "git grep" has a similar
scaling pattern, i.e. the threads all need to compete for I/O to
read from the filesystem to find needles from the haystack, perhaps
it would give us a precedent to model the behaviour of this part of
the code, too, hopefully?