On Thu, Feb 20, 2025 at 09:26:38AM +0100, Pierre Ossman wrote: > On 20/02/2025 04:03, Han Young wrote: > > On Wed, Feb 19, 2025 at 5:58 PM Pierre Ossman <ossman@xxxxxxxxx> wrote: > > > We tried gc.bigPackThreshold in the hope it would force it to reuse > > > packs better. But all we got instead was duplication. It still creates > > > new packs with everything. It just stopped removing the old ones. > > > > Is the repo partially cloned? git-repack will always pack promisor > > packs even if it's a keep pack. This patch would fix it > > https://lore.kernel.org/git/2728513.vuYhMxLoTh@xxxxxxxxxxxxxxxxxxxx/ > > > > Yes, the big offender is often partially cloned. So that could be part of > it, thanks. > > But we're seeing it in other repositories as well. E.g. I have a long-lived > TigerVNC repository where the biggest pack file is just one week old. In > that case, it's merely 21 MiB, so it's not a practical issue. But it does > show that git keeps replacing it. > > Anything I/we can do to shed more light on the issue? Well, one of the interesting things to learn would be how often you end up updating those repositories. You have discovered "gc.autoPackLimit" already, which determines when exactly Git is going to repack existing packfiles into one, and mentioned that it doesn't seem to help you. But whether it does or doesn't help really depends on how frequently you gain new packfiles in the impacted repositories. When you have fast-moving repositories and developers fetch several times per day, then it is quite likely that they accumulate multiple new packfiles per day. And thus, it's not all that unexpected that you will have to repack the whole repository rather regularly. If so, this is working as designed. You can tune the parameters for how often Git will do an all-into-one repack, but also have to keep in mind that the more packfiles there are, the less efficient Git will in general be. That being said, there is an alternative: Git nowadays doesn't use git-gc(1) anymore to perform auto-maintenance, but instead it invokes git-maintenance(1). And that command allows the user to pick what tasks should be performed. By default it uses git-gc(1) under the hood indeed, but you also ask it to not do so and instead use an alternative mechanism to pack your objects. The alternative would be the "incremental-repack" task. This task does not use git-gc(1) with its incremental/all-into-one repack split, but it instead uses git-multi-pack-index(1). git-maintenance(1) tweaks the `--batch-size` parameter of `git multi-pack-index repack` so that it typically doesn't have to repack the one large packfile, but combines at least two smaller ones. I use a mechanism like that, which I've configured as follows: [maintenance "commit-graph"] enabled = true [maintenance "gc"] enabled = false [maintenance "incremental-repack"] enabled = true [maintenance "loose-objects"] enabled = true [maintenance "pack-refs"] enabled = true I think this strategy still isn't quite optimal, as nowadays we should probably make use of `git repack --geometric` instead of manually computing batch sizes. This would ensure that the packfiles present in the repository form a geometric sequence regarding their size, so you end up repacking the biggest packfile very infrequently. Such a task has not been implemented yet, but it shouldn't be all that hard to do, either. Patrick