Re: [PATCH v4 0/8] repack: support repacking into a geometric sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 24, 2021 at 11:13:53AM -0700, Martin Fick wrote:

> > The idea of the geometric repack is that by sorting by size and then
> > finding a "cutoff" within the size array, we can make sure that we roll
> > up a sufficiently small number of bytes in each roll-up that it ends up
> > linear in the size of the repo in the long run. But if we roll up
> > without regard to size, then our worst case is that the biggest pack is
> > the newest (imagine a repo with 10 small pushes and then one gigantic
> > one). So we roll that up with some small packs, doing effectively
> > O(size_of_repo) work.
> 
> This isn't quite a fair evaluation, it should be O(size_of_push) I think?

Sorry, I had a longer example, but then cut it down in the name of
simplicity. But I think I made it too simple. :)

You can imagine more pushes after the gigantic one, in which case we'd
roll them up with the gigantic push. So that gigantic one is part of
multiple sequential rollups, until it is itself rolled up further.

But...

> > And then in the next roll up we do it again, and so on. 
>  
> I should have clarified that the intent is to prevent this by specifying an 
> mtime after the last rollup so that this should only ever happen once for new 
> packfiles. It also means you probably need special logic to ensure this roll-up 
> doesn't happen if there would only be one file in the rollup, 

Yes, I agree that if you record a cut point, and then avoid rolling up
across it, then you'd only consider the single push once. You probably
want to record the actual pack set rather than just an mtime cutoff,
though, since Git will update the mtime on packs sometimes (to freshen
them whenever it optimizes out an object write for an object in the
pack).

One of the nice things about looking only at the pack sizes is that you
don't have to record that cut point. :) But it's possible you'd want to
for other reasons (e.g., you may spend extra work to find good deltas in
your on-disk packs, so you want to know what is old and what is new in
order to discard on-disk deltas from pushed-up packs).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux