On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares <matheus.bernardino@xxxxxx> wrote: > The above benchmarks show that parallel checkout is most effective on > repositories located on an SSD or over a distributed file system. For > local file systems on spinning disks, and/or older machines, the > parallelism does not always bring a good performance. For this reason, > the default value for checkout.workers is one, a.k.a. sequential > checkout. I wonder how many people are still using HDD, and how many will still use them in a few years. I think having 1 as the default value for checkout.workers might be good for a while for backward compatibility and stability, while people who are interested can test parallel checkout on different environments. But we might want, in a few releases, after some bugs, if any, have been fixed, to use a default, maybe 10, that will provide significant speedup for most people, and will justify the added complexity, especially as your numbers still show a small speedup for HDD when using 10. > To decide the default value for checkout.thresholdForParallelism, > another benchmark was executed in the "Local SSD" setup, where parallel > checkout showed to be beneficial. This time, we compared the runtime of > a `git checkout -f`, with and without parallelism, after randomly > removing an increasing number of files from the Linux working tree. The > "sequential fallback" column bellow corresponds to the executions where s/bellow/below/ > checkout.workers was 10 but checkout.thresholdForParallelism was equal > to the number of to-be-updated files plus one (so that we end up writing > sequentially). Each test case was sampled 15 times, and each sample had > a randomly different set of files removed. Here are the results: > > sequential fallback 10 workers speedup > 10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02 > 20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02 > 50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02 > 100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04 > 200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05 > 500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09 > 1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12 > 2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12 > > From the above numbers, 100 files seems to be a reasonable default value > for the threshold setting. > > Note: Up to 1000 files, we observe a drop in the execution time of the > parallel code with an increase in the number of files. This is a rather > odd behavior, but it was observed in multiple repetitions. Above 1000 > files, the execution time increases according to the number of files, as > one would expect. > > About the test environments: Local SSD tests were executed on an > i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local > HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores > with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running > Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge > instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large > instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing, > the linux repository was removed (or checked out back to its previous > state), and `sync && sysctl vm.drop_caches=3` was executed. Thanks for all these tests and details! > Co-authored-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx> [...] > @@ -23,6 +25,19 @@ enum pc_status parallel_checkout_status(void) > return parallel_checkout.status; > } > > +#define DEFAULT_THRESHOLD_FOR_PARALLELISM 100 Using a "static const int" might be a bit better. > +void get_parallel_checkout_configs(int *num_workers, int *threshold) > +{ > + if (git_config_get_int("checkout.workers", num_workers)) > + *num_workers = 1; I think it's better when an important default value like this "1" is made more visible using a "static const int" or a "#define". > + else if (*num_workers < 1) > + *num_workers = online_cpus(); > + > + if (git_config_get_int("checkout.thresholdForParallelism", threshold)) > + *threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM; > +} > + > void init_parallel_checkout(void) > { > if (parallel_checkout.status != PC_UNINITIALIZED) > @@ -554,11 +569,9 @@ static void write_items_sequentially(struct checkout *state) > write_pc_item(¶llel_checkout.items[i], state); > } > > -#define DEFAULT_NUM_WORKERS 2 As I say above, it doesn't look like a good idea to have removed this. > -int run_parallel_checkout(struct checkout *state) > +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold) > { > - int ret, num_workers = DEFAULT_NUM_WORKERS; > + int ret; > > if (parallel_checkout.status != PC_ACCEPTING_ENTRIES) > BUG("cannot run parallel checkout: uninitialized or already running"); > @@ -568,7 +581,7 @@ int run_parallel_checkout(struct checkout *state) > if (parallel_checkout.nr < num_workers) > num_workers = parallel_checkout.nr; > > - if (num_workers <= 1) { > + if (num_workers <= 1 || parallel_checkout.nr < threshold) { Here we check the number of workers... > write_items_sequentially(state); > } else { > struct pc_worker *workers = setup_workers(state, num_workers); [...] > @@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o, > } > } > stop_progress(&progress); > - errs |= run_parallel_checkout(&state); > + if (pc_workers > 1) ...but the number of workers was already checked here. > + errs |= run_parallel_checkout(&state, pc_workers, pc_threshold); > errs |= finish_delayed_checkout(&state, NULL); > git_attr_set_direction(GIT_ATTR_CHECKIN);