Re: [PATCH v2 5/8] p5303: measure time to repack with keep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 03, 2021 at 10:59:13PM -0500, Taylor Blau wrote:

> From: Jeff King <peff@xxxxxxxx>
> 
> This is the same as the regular repack test, except that we mark the
> single base pack as "kept" and use --assume-kept-packs-closed. The

I don't think that option exists anymore. I guess we are just using
--stdin-packs, which causes us to mark a pack as kept.

I think we could just mark it in the filesystem and use
--honor-pack-keep, which would make it independent of your new feature.
At first I was going to say "but it doesn't matter either way", but...

> theory is that this should be faster than the normal repack, because
> we'll have fewer objects to traverse and process.
> 
> Here are some timings on a recent clone of the kernel. In the
> single-pack case, there is nothing do since there are no non-excluded
> packs:
> 
>   5303.5: repack (1)                          57.42(54.88+10.64)
>   5303.6: repack with --stdin-packs (1)       0.01(0.01+0.00)
> 
> and in the 50-pack case, it is much faster to use `--stdin-packs`, since
> we avoid having to consider any objects in the excluded pack:
> 
>   5303.10: repack (50)                        71.26(88.24+4.96)
>   5303.11: repack with --stdin-packs (50)     3.49(11.82+0.28)
> 
> but our improvements vanish as we approach 1000 packs.
> 
>   5303.15: repack (1000)                      215.64(491.33+14.80)
>   5303.16: repack with --stdin-packs (1000)   198.79(380.51+7.97)
> 
> That's because the code paths around handling .keep files are known to
> scale badly; they look in every single pack file to find each object.

Well, part of it is just that with 1000 packs we have 20 times as many
objects that are actually getting packed with --stdin-packs, compared to
the 50-pack case. IIRC, each pack is a fixed-size slice and then the
residual is put into the .keep pack. So the fact that the time gets
closer to a full repack as we add more packs is expected: we are asking
pack-objects to do more work!

For showing the impact of the optimizations in patches 7 and 8, I think
doing a full repack with --honor-pack-keep is a better test. Because
then we're always doing a full traversal, and most of the work continues
to scale with the repo size (though obviously not the actual shuffling
of packed bytes around). That would get rid of the weird "no work to do"
case in the single-pack tests, too.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux