On Wed, 2017-02-08 at 14:08 -0500, Jeff King wrote: > On Wed, Feb 08, 2017 at 02:05:42PM -0500, David Turner wrote: > > > On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote: > > > Duy Nguyen <pclouds@xxxxxxxxx> writes: > > > > > > > On second thought, perhaps gc.autoDetach should default to false if > > > > there's no tty, since its main point it to stop breaking interactive > > > > usage. That would make the server side happy (no tty there). > > > > > > Sounds like an idea, but wouldn't that keep the end-user coming over > > > the network waiting after accepting a push until the GC completes, I > > > wonder. If an impatient user disconnects, would that end up killing > > > an ongoing GC? etc. > > > > Regardless, it's impolite to keep the user waiting. So, I think we > > should just not write the "too many unreachable loose objects" message > > if auto-gc is on. Does that sound OK? > > I thought the point of that message was to prevent auto-gc from kicking > in over and over again due to objects that won't actually get pruned. > > I wonder if you'd want to either bump the auto-gc object limit, or > possibly reduce the gc.pruneExpire limit to keep this situation from > coming up in the first place (or at least mitigating the amount of time > it's the case). Auto-gc might not succeed in pruning objects, but it will at least reduce the number of packs, which is super-important for performance. I think the intent of automatic gc is to have a git repository be relatively low-maintenance from a server-operator perspective. (Side note: it's fairly trivial for a user with push access to mess with the check simply by pushing a bunch of objects whose shas start with 17). It seems odd that git gets itself into a state where it refuses to do any maintenance just because at some point some piece of the maintenance didn't make progress. Sure, I could change my configuration, but that doesn't help the other folks (e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813084 ) who run into this. I have three thoughts on this: Idea 1: when gc --auto would issue this message, instead it could create a file named gc.too-much-garbage (instead of gc.log), with this message. If that file exists, and it is less than one day (?) old, then we don't attempt to do a full gc; instead we just run git repack -A -d. (If it's more than one day old, we just delete it and continue anyway). Idea 2 : Like idea 1, but instead of repacking, just smash the existing packs together into one big pack. In other words, don't consider dangling objects, or recompute deltas. Twitter has a tool called "git combine-pack" that does this: https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c That's less space-efficient than a true repack, but it's no worse than having the packs separate, and it's a win for read performance because there's no need to do a linear search over N packs to find an object. Idea 3: As I suggested last time, separate fatal and non-fatal errors. If gc fails because of EIO or something, we probably don't want to touch the disk anymore. But here, the worst consequence is that we waste some processing power. And it's better to occasionally waste processing power in a non-interactive setting than it is to do so when a user will be blocked on it. So non-fatal warnings should go to gc.log, and fatal errors should go to gc.fatal. gc.log won't block gc from running. I think this is my preferred option.