On Tuesday, February 23, 2021 3:15:12 PM MST Jeff King wrote: > On Tue, Feb 23, 2021 at 12:54:56PM -0700, Martin Fick wrote: > > > Yeah, this is definitely a heuristic that can get out of sync with > > > reality. I think in general if you have base pack A and somebody pushes > > > up B, C, and D in sequence, we're likely to roll up a single DBC (in > > > that order) pack. Further pushes E, F, G would have newer mtimes. So we > > > might get GFEDBC directly. Or we might get GFE and DBC, but the former > > > would still have a newer mtime, so we'd create GFEDBC on the next run. > > > > > > The issues come from: > > > - we are deciding what to roll up based on size. A big push might not > > > > > > get rolled up immediately, putting it out-of-sync with the rest of > > > the rollups. > > > > Would it make sense to somehow detect all new packs since the last rollup > > and always include them in the rollup no matter what their size? That is > > one thing that my git-exproll script did. One of the main reasons to do > > this was because newer packs tended to look big (I was using bytes to > > determine size), and newer packs were often bigger on disk compared to > > other packs with similar objects in them (I think you suggested this was > > due to the thickening of packs on receipt). Maybe roll up all packs with > > a timestamp "new enough", no matter how big they are? > > That works against the "geometric" part of the strategy, which is trying > to roll up in a sequence that is amortized-linear. I.e., we are not > always rolling up everything outside of the base pack, but trying to > roll up little into medium, and then eventually medium into large. If > you roll up things that are "too big", then you end up rewriting the > bytes more often, and your amount of work becomes super-linear. I'm not sure I follow, it would seem to me that it would stay linear, and be at most rewriting each new packfile once more than previously? Are you envisioning more work than that? > Now whether that matters all that much or not is perhaps another > discussion. The current strategy is mostly to repack all-into-one with > no base, which is the worst possible case. So just about any rollup > strategy will be an improvement. ;) +1 Yes, while anything would be an improvement, this series' approach is very good! Thanks for doing this!! -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation