Re: [PATCH v4 11/19] parallel-checkout: make it truly parallel

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Wed, 16 Dec 2020 14:31:23 -0800

On Wed, Nov 04, 2020 at 05:33:10PM -0300, Matheus Tavares wrote:
> 
> Use multiple worker processes to distribute the queued entries and call
> write_checkout_item() in parallel for them. The items are distributed
> uniformly in contiguous chunks. This minimizes the chances of two
> workers writing to the same directory simultaneously, which could
> affect performance due to lock contention in the kernel. Work stealing
> (or any other format of re-distribution) is not implemented yet.
> 
> The parallel version was benchmarked during three operations in the
> linux repo, with cold cache: cloning v5.8, checking out v5.8 from
> v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The
> four tables below show the mean run times and standard deviations for
> 5 runs in: a local file system with SSD, a local file system with HDD, a
> Linux NFS server, and Amazon EFS. The numbers of workers were chosen
> based on what produces the best result for each case.
> 
> Local SSD:
> 
>             Clone                  Checkout I             Checkout II
> Sequential  8.171 s ± 0.206 s      8.735 s ± 0.230 s      4.166 s ± 0.246 s
> 10 workers  3.277 s ± 0.138 s      3.774 s ± 0.188 s      2.561 s ± 0.120 s
> Speedup     2.49 ± 0.12            2.31 ± 0.13            1.63 ± 0.12
> 
> Local HDD:
> 
>             Clone                  Checkout I             Checkout II
> Sequential  35.157 s ± 0.205 s     48.835 s ± 0.407 s     47.302 s ± 1.435 s
> 8 workers   35.538 s ± 0.325 s     49.353 s ± 0.826 s     48.919 s ± 0.416 s
> Speedup     0.99 ± 0.01            0.99 ± 0.02            0.97 ± 0.03
> 
> Linux NFS server (v4.1, on EBS, single availability zone):
> 
>             Clone                  Checkout I             Checkout II
> Sequential  216.070 s ± 3.611 s    211.169 s ± 3.147 s    57.446 s ± 1.301 s
> 32 workers  67.997 s ± 0.740 s     66.563 s ± 0.457 s     23.708 s ± 0.622 s
> Speedup     3.18 ± 0.06            3.17 ± 0.05            2.42 ± 0.08
> 
> EFS (v4.1, replicated over multiple availability zones):
> 
>             Clone                  Checkout I             Checkout II
> Sequential  1249.329 s ± 13.857 s  1438.979 s ± 78.792 s  543.919 s ± 18.745 s
> 64 workers  225.864 s ± 12.433 s   316.345 s ± 1.887 s    183.648 s ± 10.095 s
> Speedup     5.53 ± 0.31            4.55 ± 0.25            2.96 ± 0.19
> 
> The above benchmarks show that parallel checkout is most effective on
> repositories located on an SSD or over a distributed file system. For
> local file systems on spinning disks, and/or older machines, the
> parallelism does not always bring a good performance. In fact, it can
> even increase the run time. For this reason, the sequential code is
> still the default. Two settings are added to optionally enable and
> configure the new parallel version as desired.
> 
> Local SSD tests were executed in an i7-7700HQ (4 cores with
> hyper-threading) running Manjaro Linux. Local HDD tests were executed in
> an i7-2600 (also 4 cores with hyper-threading), HDD Seagate Barracuda
> 7200 rpm SATA 3.0, running Debian 9.13. NFS and EFS tests were
> executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux
> NFS server was running on a m6g.large instance with 1 TB, EBS GP2
> volume. Before each timing, the linux repository was removed (or checked
> out back), and `sync && sysctl vm.drop_caches=3` was executed.
> 
> Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
> Co-authored-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx>

Only having done a quick skim, is there a reason that you are doing the
workqueue handling from scratch rather than using
run-command.h:run_processes_parallel()? The implementation you use and
the implementation in run-command.c seem really similar to me.

 - Emily