On Wed, Nov 04, 2020 at 05:33:10PM -0300, Matheus Tavares wrote: > > Use multiple worker processes to distribute the queued entries and call > write_checkout_item() in parallel for them. The items are distributed > uniformly in contiguous chunks. This minimizes the chances of two > workers writing to the same directory simultaneously, which could > affect performance due to lock contention in the kernel. Work stealing > (or any other format of re-distribution) is not implemented yet. > > The parallel version was benchmarked during three operations in the > linux repo, with cold cache: cloning v5.8, checking out v5.8 from > v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The > four tables below show the mean run times and standard deviations for > 5 runs in: a local file system with SSD, a local file system with HDD, a > Linux NFS server, and Amazon EFS. The numbers of workers were chosen > based on what produces the best result for each case. > > Local SSD: > > Clone Checkout I Checkout II > Sequential 8.171 s ± 0.206 s 8.735 s ± 0.230 s 4.166 s ± 0.246 s > 10 workers 3.277 s ± 0.138 s 3.774 s ± 0.188 s 2.561 s ± 0.120 s > Speedup 2.49 ± 0.12 2.31 ± 0.13 1.63 ± 0.12 > > Local HDD: > > Clone Checkout I Checkout II > Sequential 35.157 s ± 0.205 s 48.835 s ± 0.407 s 47.302 s ± 1.435 s > 8 workers 35.538 s ± 0.325 s 49.353 s ± 0.826 s 48.919 s ± 0.416 s > Speedup 0.99 ± 0.01 0.99 ± 0.02 0.97 ± 0.03 > > Linux NFS server (v4.1, on EBS, single availability zone): > > Clone Checkout I Checkout II > Sequential 216.070 s ± 3.611 s 211.169 s ± 3.147 s 57.446 s ± 1.301 s > 32 workers 67.997 s ± 0.740 s 66.563 s ± 0.457 s 23.708 s ± 0.622 s > Speedup 3.18 ± 0.06 3.17 ± 0.05 2.42 ± 0.08 > > EFS (v4.1, replicated over multiple availability zones): > > Clone Checkout I Checkout II > Sequential 1249.329 s ± 13.857 s 1438.979 s ± 78.792 s 543.919 s ± 18.745 s > 64 workers 225.864 s ± 12.433 s 316.345 s ± 1.887 s 183.648 s ± 10.095 s > Speedup 5.53 ± 0.31 4.55 ± 0.25 2.96 ± 0.19 > > The above benchmarks show that parallel checkout is most effective on > repositories located on an SSD or over a distributed file system. For > local file systems on spinning disks, and/or older machines, the > parallelism does not always bring a good performance. In fact, it can > even increase the run time. For this reason, the sequential code is > still the default. Two settings are added to optionally enable and > configure the new parallel version as desired. > > Local SSD tests were executed in an i7-7700HQ (4 cores with > hyper-threading) running Manjaro Linux. Local HDD tests were executed in > an i7-2600 (also 4 cores with hyper-threading), HDD Seagate Barracuda > 7200 rpm SATA 3.0, running Debian 9.13. NFS and EFS tests were > executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux > NFS server was running on a m6g.large instance with 1 TB, EBS GP2 > volume. Before each timing, the linux repository was removed (or checked > out back), and `sync && sysctl vm.drop_caches=3` was executed. > > Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> > Co-authored-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx> Only having done a quick skim, is there a reason that you are doing the workqueue handling from scratch rather than using run-command.h:run_processes_parallel()? The implementation you use and the implementation in run-command.c seem really similar to me. - Emily