Re: [PATCH] gc: do not warn about too many loose objects

Jeff King <peff@xxxxxxxx> · Wed, 18 Jul 2018 13:29:13 -0400

On Wed, Jul 18, 2018 at 03:11:38PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, I agree that deferring repeated gc's based on time is going to run
> > into pathological corner cases. OTOH, what we've found at GitHub is that
> > "gc --auto" is quite insufficient for scheduling maintenance anyway
> > (e.g., if a machine gets pushes to 100 separate repositories in quick
> > succession, you probably want to queue and throttle any associated gc).
> 
> I'm sure you have better solutions for this at GitHub, but as an aside
> it might be interesting to add some sort of gc flock + retry setting for
> this use-case, i.e. even if you had 100 concurrent gc's due to
> too_many_*(), they'd wait + retry until they could get the flock on a
> given file.
> 
> Then again this is probably too specific, and could be done with a
> pre-auto-gc hook too..

Yeah, I think any multi-repo solution is getting way too specific for
Git, and the best thing we can do is provide a hook. I agree you could
probably do this today with a pre-auto-gc hook (if it skips gc it would
just queue itself and return non-zero). Or even just make a mark in a
database that says "there was some activity here".

Since we have so much other infrastructure sitting between the user and
Git anyway, we do that marking at a separate layer which is already
talking to a database. ;)

Anyway, I do agree with your general notion that this isn't the right
approach for many situations, and auto-gc is a better solution for many
cases.

-Peff