Re: 2.18.0 Regression: packing performance and effectiveness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 20, 2018 at 7:28 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Thu, Jul 19, 2018 at 04:11:01PM -0700, Elijah Newren wrote:
>
> > Looking at the output from Peff's instrumentation elsewhere in this
> > thread, I see a lot of lines like
> >    mismatched get: 32889efd307c7be376da9e3d45a78305f14ba73a = (, 28)
> > Does that mean it was reading the array when it wasn't ready?
>
> Yes, it looks like we saw a "get" without a "set". Though this could
> also be due to threading. The tracing isn't atomic with respect to the
> actual get/set operation, so it's possible that the ordering of the
> trace output does not match the ordering of the actual operations.
>
> > However, it's interesting to also look at the effect on packing
> > linux.git (on the same beefy hardware):
> >
> > Version  Pack (MB)  MaxRSS(kB)  Time (s)
> > -------  ---------  ----------  --------
> >  2.17.0     1279     11382932      632.24
> >  2.18.0     1279     10817568      621.97
> >  fiv-v4     1279     11484168     1193.67
> >
> > While the pack size is nice and small, the original memory savings
> > added in 2.18.0 are gone and the performance is much worse.  :-(
>
> Interesting. I can't reproduce here. The fix-v4 case is only slightly
> slower than 2.18.0. Can you double check that your compiler flags, etc,
> were the same? Many times I've accidentally compared -O0 to -O0. :)

He ran 40 threads though. That number of threads can make lock
contention very expensive. Yeah my money is also on lock contention.

> You might also try the patch below (on top of fix-v4), which moves the

Another thing Elijah could try is watch CPU utilization. If this is
lock contention, I think core utilization should be much lower because
we spend more time waiting than actually doing things.

> locking to its own dedicated mutex. That should reduce lock contention,

I think we could use cache_lock() which is for non-odb shared data
(and delta_size[] fits this category)

> and it fixes the remaining realloc where I think we're still racy. On my

Yeah it's not truly racy as you also noted in another mail. I'll make
a note about this in the commit message.

> repack of linux.git, it dropped the runtime from 6m3s to 5m41s, almost
> entirely in system CPU.
-- 
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux