Re: [PATCH v2 5/8] p5303: measure time to repack with keep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 16, 2021 at 06:58:16PM -0500, Jeff King wrote:
> On Wed, Feb 03, 2021 at 10:59:13PM -0500, Taylor Blau wrote:
>
> > From: Jeff King <peff@xxxxxxxx>
> >
> > This is the same as the regular repack test, except that we mark the
> > single base pack as "kept" and use --assume-kept-packs-closed. The
>
> I don't think that option exists anymore. I guess we are just using
> --stdin-packs, which causes us to mark a pack as kept.
>
> I think we could just mark it in the filesystem and use
> --honor-pack-keep, which would make it independent of your new feature.
> At first I was going to say "but it doesn't matter either way", but...
>
> > theory is that this should be faster than the normal repack, because
> > we'll have fewer objects to traverse and process.
> >
> > Here are some timings on a recent clone of the kernel. In the
> > single-pack case, there is nothing do since there are no non-excluded
> > packs:
> >
> >   5303.5: repack (1)                          57.42(54.88+10.64)
> >   5303.6: repack with --stdin-packs (1)       0.01(0.01+0.00)
> >
> > and in the 50-pack case, it is much faster to use `--stdin-packs`, since
> > we avoid having to consider any objects in the excluded pack:
> >
> >   5303.10: repack (50)                        71.26(88.24+4.96)
> >   5303.11: repack with --stdin-packs (50)     3.49(11.82+0.28)
> >
> > but our improvements vanish as we approach 1000 packs.
> >
> >   5303.15: repack (1000)                      215.64(491.33+14.80)
> >   5303.16: repack with --stdin-packs (1000)   198.79(380.51+7.97)
> >
> > That's because the code paths around handling .keep files are known to
> > scale badly; they look in every single pack file to find each object.
>
> Well, part of it is just that with 1000 packs we have 20 times as many
> objects that are actually getting packed with --stdin-packs, compared to
> the 50-pack case. IIRC, each pack is a fixed-size slice and then the
> residual is put into the .keep pack. So the fact that the time gets
> closer to a full repack as we add more packs is expected: we are asking
> pack-objects to do more work!

No, the residual base pack isn't marked as kept on-disk. But the
--stdin-packs test treats it as such, by passing '^pack-$base_pack.pack'
as input to '--stdin-packs' (thus marking it as kept in-core).

> For showing the impact of the optimizations in patches 7 and 8, I think
> doing a full repack with --honor-pack-keep is a better test. Because
> then we're always doing a full traversal, and most of the work continues
> to scale with the repo size (though obviously not the actual shuffling
> of packed bytes around). That would get rid of the weird "no work to do"
> case in the single-pack tests, too.

I think you're suggesting that we change the "repack ($nr_packs)" test
to have the residual pack marked as kept (so we're measuring time it
takes to repack everything that _isn't_ in the base pack)?

That would allow a more direct comparison, but I think it's loosing out
on an important aspect which is how long it takes to pack the entire
repository. Maybe we want three.

What do you think?

> -Peff

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux