Re: [PATCH v2 8/8] builtin/repack.c: add '--geometric' option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 17, 2021 at 01:17:16PM -0500, Jeff King wrote:
> On Wed, Feb 03, 2021 at 10:59:25PM -0500, Taylor Blau wrote:
>
> > Often it is useful to both:
> >
> >   - have relatively few packfiles in a repository, and
> >
> >   - avoid having so few packfiles in a repository that we repack its
> >     entire contents regularly
> >
> > This patch implements a '--geometric=<n>' option in 'git repack'. This
> > allows the caller to specify that they would like each pack to be at
> > least a factor times as large as the previous largest pack (by object
> > count).
> >
> > Concretely, say that a repository has 'n' packfiles, labeled P1, P2,
> > ..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'.
> > With a geometric factor of 'r', it should be that:
> >
> >   objects(Pi) > r*objects(P(i-1))
> >
> > for all i in [1, n], where the packs are sorted by
> >
> >   objects(P1) <= objects(P2) <= ... <= objects(Pn).
>
> Just devil's advocating for a moment.
>
> [large push becoming the biggest pack in a repository]
>
>   - it may have been more carefully packed (e.g., with a larger window
>     size, using "-f", etc) than the packs we got from pushes. We do
>     _mostly_ retain the deltas when we roll up the packs, so it probably
>     only has a small impact in practice (I'd expect in a few cases we'd
>     throw away deltas because a pushed pack contains a duplicate of its
>     base object that we added via --fix-thin).

Yeah, agreed.

> So I suspect it's probably OK in practice. These cases would happen
> rarely, and the impact would not be all that big. The bitmap thing I'd
> worry the most about. As part of a larger strategy involving a midx it
> is taken care of, but people using just this new feature may not realize
> that. The bitmaps of course are "just" an optimization, but it's hard to
> say how dire things are when they don't exist. For many situations,
> probably not very dire. But I know that on our servers, when repos lack
> bitmaps, people notice the performance degradation.
>
> On the other hand, by definition this happens in a case where there are
> more objects that have just been pushed (and are therefore not
> bitmapped) than existed already. So you _already_ have a performance
> problem either way until you get bitmap coverage of those new objects.

I almost split my reply between this and the above paragraph to say
exactly this. I think in this case you'd want to rewrite your bitmap
from scratch either way (whether you were using multi-pack or
traditional reachability bitmaps).

> > --- a/Documentation/git-repack.txt
> > +++ b/Documentation/git-repack.txt
> > @@ -165,6 +165,17 @@ depth is 4095.
> >  	Pass the `--delta-islands` option to `git-pack-objects`, see
> >  	linkgit:git-pack-objects[1].
> >
> > +-g=<factor>::
> > +--geometric=<factor>::
> > +	Arrange resulting pack structure so that each successive pack
> > +	contains at least `<factor>` times the number of objects as the
> > +	next-largest pack.
> > ++
> > +`git repack` ensures this by determining a "cut" of packfiles that need to be
> > +repacked into one in order to ensure a geometric progression. It picks the
> > +smallest set of packfiles such that as many of the larger packfiles (by count of
> > +objects contained in that pack) may be left intact.
>
> I think we might need to make clear in the documentation how this
> differs from other repacks, in that it is not considering reachability
> at all. I like the term "roll up" to describe what is happening, but we
> probably need to define that term clearly, as well.

All fair suggestions, thanks.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux