Re: Resolving deltas dominates clone time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 23, 2019 at 05:08:40PM +0700, Duy Nguyen wrote:

> On Tue, Apr 23, 2019 at 11:45 AM Jeff King <peff@xxxxxxxx> wrote:
> >
> > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote:
> >
> > > Here are my p5302 numbers on linux.git, by the way.
> > >
> > >   Test                                           jk/p5302-repeat-fix
> > >   ------------------------------------------------------------------
> > >   5302.2: index-pack 0 threads                   307.04(303.74+3.30)
> > >   5302.3: index-pack 1 thread                    309.74(306.13+3.56)
> > >   5302.4: index-pack 2 threads                   177.89(313.73+3.60)
> > >   5302.5: index-pack 4 threads                   117.14(344.07+4.29)
> > >   5302.6: index-pack 8 threads                   112.40(607.12+5.80)
> > >   5302.7: index-pack default number of threads   135.00(322.03+3.74)
> > >
> > > which still imply that "4" is a win over "3" ("8" is slightly better
> > > still in wall-clock time, but the total CPU rises dramatically; that's
> > > probably because this is a quad-core with hyperthreading, so by that
> > > point we're just throttling down the CPUs).
> >
> > And here's a similar test run on a 20-core Xeon w/ hyperthreading (I
> > tweaked the test to keep going after eight threads):
> >
> > Test                            HEAD
> > ----------------------------------------------------
> > 5302.2: index-pack 1 threads    376.88(364.50+11.52)
> > 5302.3: index-pack 2 threads    228.13(371.21+17.86)
> > 5302.4: index-pack 4 threads    151.41(387.06+21.12)
> > 5302.5: index-pack 8 threads    113.68(413.40+25.80)
> > 5302.6: index-pack 16 threads   100.60(511.85+37.53)
> > 5302.7: index-pack 32 threads   94.43(623.82+45.70)
> > 5302.8: index-pack 40 threads   93.64(702.88+47.61)
> >
> > I don't think any of this is _particularly_ relevant to your case, but
> > it really seems to me that the default of capping at 3 threads is too
> > low.
> 
> Looking back at the multithread commit, I think the trend was the same
> and I capped it because the gain was not proportional to the number of
> cores we threw at index-pack anymore. I would not be opposed to
> raising the cap though (or maybe just remove it)

I'm not sure what the right cap would be. I don't think it's static;
we'd want ~4 threads on the top case, and 10-20 on the bottom one.

It does seem like there's an inflection point in the graph at N/2
threads. But then maybe that's just because these are hyper-threaded
machines, so "N/2" is the actual number of physical cores, and the
inflated CPU times above that are just because we can't turbo-boost
then, so we're actually clocking slower. Multi-threaded profiling and
measurement is such a mess. :)

So I'd say the right answer is probably either online_cpus() or half
that. The latter would be more appropriate for the machines I have, but
I'd worry that it would leave performance on the table for non-intel
machines.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux