Re: [PATCH v4] Add git-grep threads param

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 9, 2015 at 9:28 AM, Victor Leschuk
<vleschuk@xxxxxxxxxxxxxxxx> wrote:
>
> Maybe use the simplest version (and keep num_numbers == 0 also as flag for all other checks in code like if(num_flags) .... ):
>
> if (list.nr || cached )
>   num_threads = 0; // do not use threads
> else if (num_threads == 0)
>   num_threads = online_cpus() <= 1 ? 0 : GREP_NUM_THREADS_DEFAULT;

I will say this AGAIN.

The number of threads is *not* about the number of CPU's. Stop this
craziness. It's wrong.

The number of threads is about parallelism. Yes, CPU's is a small part
of it. But as mentioned earlier, the *big* wins are for slow
filesystems, NFS in particular. On NFS, even if you have things
cached, the cache revalidation is going to cause network traffic
almost every time. Being able to have several of those outstanding is
a big deal.

So stop with the "online_cpus()" stuff. And don't base your benchmarks
purely on the CPU-bound case. Because the CPU-bound case is the case
that is already generally so good that few people will care all *that*
deeply.

Many of the things git does are not for "best-case" behavior, but to
avoid bad "worst-case" situations. Look at things like the index
preloading (also threaded). The big win there is - again - when the
stat() calls may need IO. Sure, it can help for CPU use too, but
especially on Linux, cached "stat()" calls are really quite cheap. The
big upside is, again, in situations like git repositories over NFS.

In the CPU-intensive case, the threading might make things go from a
couple of seconds to half a second. Big deal. You're not getting up to
get a coffee in either case.

In the network traffic case, the threading might make things go from
one minute to ten seconds. And *that* is a big deal. That's huge.
That's "annoyingly slow" to "oh, this is the fastest SCM I have ever
worked with in my life".

That can literally be something that changes how you work for a
developer. You start doing things that you simply would never
otherwise do.

So *none* of the threading in git is about CPU's. Maybe we should add
big honking comments about that.

And that big honking comment should be in the documentation for the
new flag too. Because it would be really really sad if people say "I
have a laptop with just two cores, so I'll set the threading option to
2", when they then work mostly over a slow wireless network and their
company wants minimal local installs.

The biggest reason to NOT EVER add that configuration is literally the
confusion about this being about CPU's. That was what got the whole
thread started, that was what the original benchmark numbers were
about, and that was WRONG.

So I would strongly suggest that Junio ignore these patches unless
they very clearly talk about the whole IO issue. Both in the source
code, in the commit messages, and in the documentation for the config
option.

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]