Re: [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/18/2023 6:39 AM, Jeff King wrote:
> On Mon, Apr 17, 2023 at 07:03:08PM -0400, Taylor Blau wrote:

I agree with the prior discussion that gc.bigPackThreshold is
currently misbehaving and stopping it from caring about cruft packs
is the best way to fix that behavior in this series.

>> It is possible that in the future we could support writing multiple
>> cruft packs (we already handle the presence of multiple cruft packs
>> fine, just don't expose an easy way for the user to write >1 of them).
>> And at that point we would be able to relax this patch a bit and allow
>> `gc.bigPackThreshold` to cover cruft packs, too. But in the meantime,
>> the benefit of avoiding loose object explosions outweighs the possible
>> drawbacks here, IMHO.
> 
> I wondered if that interface might be an option to say "hey, I have a
> gigantic cruft file I want to carry forward, please leave it alone".
> 
> But if you have a giant cruft pack that is making your "git gc" too
> slow, it will eventually age out on its own. And if you're impatient,
> then "git gc --prune=now" is probably the right tool.
> 
> And If you really did want to keep rolling it forward for some reason,
> then I'd think marking it with ".keep" would be the best thing (and
> maybe even dropping the mtimes file? I'm not sure a how a kept-cruft
> pack does or should behave).

Generally, it's probably a good idea to (later) create a separate knob
for "don't rewrite the objects in a 'big' cruft pack unless you need
to". For situations where cruft objects are being collected and not
regularly pruned, this helps avoid repacking all unreachable objects
into a giant single pack, even though only a small number of objects
were discovered unreachable this time.

The important times where we'd want to consider a 'big' cruft pack
for rewrite are:

 1. Some objects in the cruft pack are being pruned.
 2. Some objects in the cruft pack need updated mtimes.

However, in the typical case that we are adding new cruft objects
and not changing the mtimes of existing unreachable objects, we could
create a sensible limit on the size of a cruft pack to be rewritten
during normal maintenance.

My personal preference would be something between 2GB and 10GB, which
seems like a decent balance between "size of cruft pack" and "number of
cruft packs" for most repositories. Since none of the objects are
reachable, we don't really care about them having good deltas for things
like fetches and clones. The benefit of reducing the time spent in 'git
repack --cruft' outweighs the slight disk space savings by having a
single cruft pack, in my opinion.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux