Re: [PATCH v4] Add git-grep threads param

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 09, 2015 at 09:55:34AM -0800, Linus Torvalds wrote:

> > if (list.nr || cached )
> >   num_threads = 0; // do not use threads
> > else if (num_threads == 0)
> >   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;
> 
> I will say this AGAIN.
> 
> The number of threads is *not* about the number of CPU's. Stop this
> craziness. It's wrong.

I agree with you (and that's why we have a hard-coded default here, and
not online_cpus()).

This check is just whether to turn the threading on or not, based on
whether we have multiple CPUs at all. And I agree that probably should
_not_ be respecting online_cpus() either, because we are better off
parallelizing even on a single CPU to avoid fs latency.

But note that this is already what the code _currently_ does. I do not
mind changing it, but that is a separate notion from Victor's topic,
which is to make the number of threads configurable at all.

What I'd really hope to see is:

  1. Victor adds --threads support, and changes _nothing else_ about the
     defaults.

  2. People experiment with --threads to see what works best in a
     variety of cases (especially things like cold-cache NFS). Then we
     get patches based on real world numbers to adjust:

       2a. GREP_NUM_THREADS_DEFAULT (either bumping the hard-coded
           value, or using some dynamic heuristic)

       2b. What to do for the single-CPU case.

We're doing step 1 here. I don't mind jumping to 2b if we're pretty sure
it's the right thing (and I think it probably is), but it should
definitely be a separate patch.

> So *none* of the threading in git is about CPU's. Maybe we should add
> big honking comments about that.

Minor nit: there definitely _are_ CPU-bound parallel tasks in git.
pack-objects and index-pack come to mind. If we add any user-visible
advice about tweaking thread configuration, we should be sure
that it is scoped to the appropriate flags.

> And that big honking comment should be in the documentation for the
> new flag too. Because it would be really really sad if people say "I
> have a laptop with just two cores, so I'll set the threading option to
> 2", when they then work mostly over a slow wireless network and their
> company wants minimal local installs.

Agreed. It would probably make sense for the "--threads" option
documentation for each command to discuss whether it is meant to
parallelize work across CPUs, or avoid filesystem latency.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]