Re: [PATCH 03/10] builtin/gc.c: ignore cruft packs with `--keep-largest-pack`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 17, 2023 at 07:03:08PM -0400, Taylor Blau wrote:

> On Mon, Apr 17, 2023 at 03:54:35PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@xxxxxxxxxxxx> writes:
> >
> > >   - The same is true for `gc.bigPackThreshold`, if the size of the cruft
> > >     pack exceeds the limit set by the caller.
> >
> > This is not as cut-and-dried clear as the previous one.  "This pack
> > is so large that it is not worth rewriting it only to expunge a
> > handful of objects that are no longer reachable from it" is the main
> > motivation to use this configuration, but doesn't some part of the
> > same reasoning apply equally to a large cruft pack?  But let's
> > assume that the configuration is totally irrelevant to cruft packs
> > and read on.
> 
> This is an inherent design trade-off. I imagine that callers who want to
> avoid rewriting their (large) cruft packs would prefer to generate a new
> cruft pack on top with just the recently accumulated unreachable
> objects.
> 
> That kind of works, except if you need to prune objects that are packed
> in an earlier cruft pack. If you have `gc.bigPackThreshold`, there is no
> way to do this if you need to expire objects that are in cruft packs
> above that threshold.
> 
> A user may find themselves frustrated when trying to `git gc --prune`
> some sensitive object(s) from their repository doesn't appear to work,
> only to discover that `gc.bigPackThreshold` is set somewhere in their
> configuration.
> 
> Writing (largely) the same cruft pack to expunge a few objects isn't
> ideal, but it is better than the status quo. And if you have so many
> unreachable objects that this is a concern, it is probably time to prune
> anyway.

Yeah, what your patch does makes sense to me as a default behavior. In a
pre-cruft-pack world, those objects would all be left alone by
gc.bigPackThreshol (because they're loose), and the essence of
cruft-packs is creating a parallel universe where those ejected-to-loose
objects just happen to be stored in a more efficient format.

> It is possible that in the future we could support writing multiple
> cruft packs (we already handle the presence of multiple cruft packs
> fine, just don't expose an easy way for the user to write >1 of them).
> And at that point we would be able to relax this patch a bit and allow
> `gc.bigPackThreshold` to cover cruft packs, too. But in the meantime,
> the benefit of avoiding loose object explosions outweighs the possible
> drawbacks here, IMHO.

I wondered if that interface might be an option to say "hey, I have a
gigantic cruft file I want to carry forward, please leave it alone".

But if you have a giant cruft pack that is making your "git gc" too
slow, it will eventually age out on its own. And if you're impatient,
then "git gc --prune=now" is probably the right tool.

And If you really did want to keep rolling it forward for some reason,
then I'd think marking it with ".keep" would be the best thing (and
maybe even dropping the mtimes file? I'm not sure a how a kept-cruft
pack does or should behave).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux