Re: [PATCH v2] repack: Add option to preserve and prune old pack files

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 12 Mar 2017 11:03:44 -0700

Jeff King <peff@xxxxxxxx> writes:

> I can think of one downside of a time-based solution, though: if you run
> multiple gc's during the time period, you may end up using a lot of disk
> space (one repo's worth per gc). But that's a fundamental tension in the
> problem space; the whole point is to waste disk to keep helping old
> processes.

Yes.  If you want to help a process that mmap's a packfile and wants
to keep using it for N seconds, no matter how many times somebody
else ran "git repack" while you are doing your work within that
timeframe, you somehow need to make sure the NFS server knows the
file is still in use for that N seconds.

> But you may want a knob that lets you slide between those two
> things. For instance, if you kept a sliding window of N sets of
> preserved packs, and ejected from one end of the window (regardless of
> time), while inserting into the other end. James' existing patch is that
> same strategy with a hardcoded window of "1".

Again, yes.  But then the user does not get any guarantee of how
long-living a process the user can have without getting broken by
the NFS server losing track of a packfile that is still in use.  My
suggestion for the "expiry" based approach is essentially that I do
not see a useful tradeoff afforded by having such a knob.

> The other variable you can manipulate is whether to gc in the first
> place. E.g., don't gc if there are N preserved sets (or sets consuming
> more than N bytes, or whatever). You could do that check outside of git
> entirely (or in an auto-gc hook, if you're using it).

Yes, "don't gc/repack more than once within N seconds" may also be
an alternative and may generally be more useful by covering general
source of wastage coming from doing gc too frequently, not necessarily
limited to preserved pack accumulation.