Re: [PATCH v2 5/8] p5303: measure time to repack with keep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 17, 2021 at 02:13:47PM -0500, Taylor Blau wrote:

> > > That's because the code paths around handling .keep files are known to
> > > scale badly; they look in every single pack file to find each object.
> >
> > Well, part of it is just that with 1000 packs we have 20 times as many
> > objects that are actually getting packed with --stdin-packs, compared to
> > the 50-pack case. IIRC, each pack is a fixed-size slice and then the
> > residual is put into the .keep pack. So the fact that the time gets
> > closer to a full repack as we add more packs is expected: we are asking
> > pack-objects to do more work!
> 
> No, the residual base pack isn't marked as kept on-disk. But the
> --stdin-packs test treats it as such, by passing '^pack-$base_pack.pack'
> as input to '--stdin-packs' (thus marking it as kept in-core).

Sorry, I perhaps shouldn't have said ".keep" here. But it's the same
thing, isn't it? The 50 pack case is packing 50*pack_size objects
(because it's excluding everything else that is in the base pack we mark
as keep-in-core), and the 1000-pack case is packing 1000*pack_size
objects (for the same reason).

So any patterns we see between them have more to do with that, than how
the keep-handling code scales with the number of non-kept packs.

> > For showing the impact of the optimizations in patches 7 and 8, I think
> > doing a full repack with --honor-pack-keep is a better test. Because
> > then we're always doing a full traversal, and most of the work continues
> > to scale with the repo size (though obviously not the actual shuffling
> > of packed bytes around). That would get rid of the weird "no work to do"
> > case in the single-pack tests, too.
> 
> I think you're suggesting that we change the "repack ($nr_packs)" test
> to have the residual pack marked as kept (so we're measuring time it
> takes to repack everything that _isn't_ in the base pack)?
> 
> That would allow a more direct comparison, but I think it's loosing out
> on an important aspect which is how long it takes to pack the entire
> repository. Maybe we want three.

That was what I was suggesting, but I think it's equivalent to what your
--stdin-packs is testing. I guess the most interesting thing would
actually be an _additional_ pack mark as .keep (and that pack does not
even have to contain anything interesting -- the point is how much
effort it costs to find that out. Of course the bigger it is the more
pronounced the effect of avoiding lookups in it).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux