On Wed, Feb 17, 2021 at 01:17:16PM -0500, Jeff King wrote: > On Wed, Feb 03, 2021 at 10:59:25PM -0500, Taylor Blau wrote: > > > Often it is useful to both: > > > > - have relatively few packfiles in a repository, and > > > > - avoid having so few packfiles in a repository that we repack its > > entire contents regularly > > > > This patch implements a '--geometric=<n>' option in 'git repack'. This > > allows the caller to specify that they would like each pack to be at > > least a factor times as large as the previous largest pack (by object > > count). > > > > Concretely, say that a repository has 'n' packfiles, labeled P1, P2, > > ..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'. > > With a geometric factor of 'r', it should be that: > > > > objects(Pi) > r*objects(P(i-1)) > > > > for all i in [1, n], where the packs are sorted by > > > > objects(P1) <= objects(P2) <= ... <= objects(Pn). > > Just devil's advocating for a moment. > > [large push becoming the biggest pack in a repository] > > - it may have been more carefully packed (e.g., with a larger window > size, using "-f", etc) than the packs we got from pushes. We do > _mostly_ retain the deltas when we roll up the packs, so it probably > only has a small impact in practice (I'd expect in a few cases we'd > throw away deltas because a pushed pack contains a duplicate of its > base object that we added via --fix-thin). Yeah, agreed. > So I suspect it's probably OK in practice. These cases would happen > rarely, and the impact would not be all that big. The bitmap thing I'd > worry the most about. As part of a larger strategy involving a midx it > is taken care of, but people using just this new feature may not realize > that. The bitmaps of course are "just" an optimization, but it's hard to > say how dire things are when they don't exist. For many situations, > probably not very dire. But I know that on our servers, when repos lack > bitmaps, people notice the performance degradation. > > On the other hand, by definition this happens in a case where there are > more objects that have just been pushed (and are therefore not > bitmapped) than existed already. So you _already_ have a performance > problem either way until you get bitmap coverage of those new objects. I almost split my reply between this and the above paragraph to say exactly this. I think in this case you'd want to rewrite your bitmap from scratch either way (whether you were using multi-pack or traditional reachability bitmaps). > > --- a/Documentation/git-repack.txt > > +++ b/Documentation/git-repack.txt > > @@ -165,6 +165,17 @@ depth is 4095. > > Pass the `--delta-islands` option to `git-pack-objects`, see > > linkgit:git-pack-objects[1]. > > > > +-g=<factor>:: > > +--geometric=<factor>:: > > + Arrange resulting pack structure so that each successive pack > > + contains at least `<factor>` times the number of objects as the > > + next-largest pack. > > ++ > > +`git repack` ensures this by determining a "cut" of packfiles that need to be > > +repacked into one in order to ensure a geometric progression. It picks the > > +smallest set of packfiles such that as many of the larger packfiles (by count of > > +objects contained in that pack) may be left intact. > > I think we might need to make clear in the documentation how this > differs from other repacks, in that it is not considering reachability > at all. I like the term "roll up" to describe what is happening, but we > probably need to define that term clearly, as well. All fair suggestions, thanks. Thanks, Taylor