On Tuesday, February 23, 2021 4:53:21 PM MST Jeff King wrote: > On Tue, Feb 23, 2021 at 02:41:09PM -0700, Martin Fick wrote: > > > > Would it make sense to somehow detect all new packs since the last > > > > rollup > > > > and always include them in the rollup no matter what their size? That > > > > is > > > > one thing that my git-exproll script did. One of the main reasons to > > > > do > > > > this was because newer packs tended to look big (I was using bytes to > > > > determine size), and newer packs were often bigger on disk compared to > > > > other packs with similar objects in them (I think you suggested this > > > > was > > > > due to the thickening of packs on receipt). Maybe roll up all packs > > > > with > > > > a timestamp "new enough", no matter how big they are? > > > > > > That works against the "geometric" part of the strategy, which is trying > > > to roll up in a sequence that is amortized-linear. I.e., we are not > > > always rolling up everything outside of the base pack, but trying to > > > roll up little into medium, and then eventually medium into large. If > > > you roll up things that are "too big", then you end up rewriting the > > > bytes more often, and your amount of work becomes super-linear. > > > > I'm not sure I follow, it would seem to me that it would stay linear, and > > be at most rewriting each new packfile once more than previously? Are you > > envisioning more work than that? > > Maybe I don't understand what you're proposing. > > The idea of the geometric repack is that by sorting by size and then > finding a "cutoff" within the size array, we can make sure that we roll > up a sufficiently small number of bytes in each roll-up that it ends up > linear in the size of the repo in the long run. But if we roll up > without regard to size, then our worst case is that the biggest pack is > the newest (imagine a repo with 10 small pushes and then one gigantic > one). So we roll that up with some small packs, doing effectively > O(size_of_repo) work. This isn't quite a fair evaluation, it should be O(size_of_push) I think? > And then in the next roll up we do it again, and so on. I should have clarified that the intent is to prevent this by specifying an mtime after the last rollup so that this should only ever happen once for new packfiles. It also means you probably need special logic to ensure this roll-up doesn't happen if there would only be one file in the rollup, -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation