Re: [PATCH v4 5/5] diff-lib: parallelize run_diff_files for submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
> > index 6490527b45..1144a5ad74 100644
> > --- a/Documentation/config/submodule.txt
> > +++ b/Documentation/config/submodule.txt
> > @@ -93,6 +93,18 @@ submodule.fetchJobs::
> >         in parallel. A value of 0 will give some reasonable default.
> >         If unset, it defaults to 1.
> >
> > +submodule.diffJobs::
> > +       Specifies how many submodules are diffed at the same time. A
> > +       positive integer allows up to that number of submodules diffed
> > +       in parallel. A value of 0 will give the number of logical cores.
>
> Why hardcode that 0 gives the number of logical cores?  Why not just
> state that a value of 0 "gives a guess at optimal parallelism",
> allowing us to adjust it in the future if we can do some smart
> heuristics?  It'd be nice to not have us tied down and prevented from
> taking a smarter approach.

I was unaware that the original intention of "reasonable default" was for
flexibility (I have a WIP series standardizing these parallelism config
options that also used "number of logical cores" but I think that should
probably change now). There are other parallel config options that
hardcode 0 as well, so my initial thought was that we should be using
the more precise wording -- the argument for flexibility now seems
more preferable, however.

>
> > +       If unset, it defaults to 1. The diff operation is used by many
> > +       other git commands such as add, merge, diff, status, stash and
> > +       more. Note that the expensive part of the diff operation is
> > +       reading the index from cache or memory. Therefore multiple jobs
> > +       may be detrimental to performance if your hardware does not
> > +       support parallel reads or if the number of jobs greatly exceeds
> > +       the amount of supported reads.
>
> So, in the future, someone who wants to speed things up is going to
> need to configure submodule.diffJobs, submodule.fetchJobs,
> submodule.checkoutJobs, submodule.grepJobs, submodule.mergeJobs, etc.?
>  I worry that we're headed towards a bit of a suboptimal user
> experience here.  It'd be nice to have a more central configuration of
> "yes, I want parallelism; please don't make me benchmark things in
> order to take advantage of it", if that's possible.  It may just be
> that the "optimal" parallelism varies significantly between commands,
> and also varies a lot based on hardware, repository sizes, background
> load on the system, etc. such that we can't provide a reasonable
> suggestion for those that want a value greater than 1.  Or maybe in
> the future we allow folks somehow to request our best guess at a good
> parallelization level and then let users override with these
> individual flags.  I'm just a little worried we might be making users
> do work that we should somehow figure out.

I had the same worry as well -- see the discussion I had here:
https://lore.kernel.org/git/CAFySSZAbsPuyPVX0+DQzArny2CEWs+GpQqJ3AOxUB_ffo8B3SQ@xxxxxxxxxxxxxx/
I would like to also eventually solve this problem, but this patch
won't be the one to do so.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux