On Fri, Jun 09, 2017 at 02:03:18PM +0200, Lars Schneider wrote: > > I agree the existing message isn't great. There should probably be a big > > advise() block explaining what's going on (and that expert users can > > disable). > > How about this? > > diff --git a/builtin/gc.c b/builtin/gc.c > index c2c61a57bb..12ee212544 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -473,9 +473,18 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > if (pack_garbage.nr > 0) > clean_pack_garbage(); > > - if (auto_gc && too_many_loose_objects()) > - warning(_("There are too many unreachable loose objects; " > - "run 'git prune' to remove them.")); > + if (auto_gc && too_many_loose_objects()) { > + warning(_("Auto packing did not lead to optimal results as the " > + "repository contains too many unreachable objects.")); > + advice(_("Unreachable objects are Git objects (commits, files, ...) " > + "that are not referenced by any branch or tag. This might happen " > + "if you use 'git rebase' or if you delete branches. Auto packing " > + "only prunes unreachable objects that are older than 2 weeks " > + "(default, overridable by the config variable 'gc.pruneExpire'). " > + "Please run 'git prune' to prune all unreachable objects for " > + "optimal repository performance.")); > + } s/advice/advise/, of course. This probably be protected by a new entry in advice_config[] in advice.c. But I assume you are most interested in the text. I think it simultaneously goes into too much and too little detail. I think the warning itself should just say _what_ we observed: after garbage collection, there were still enough objects to trigger a gc. And then the hint doesn't need to go into the details of why we prune or what unreachable objects are. Those can be cross-referenced with other documentation. I think we need to focus on what the warning means, and whether and how they would correct it. Maybe: warning: too many loose objects remain after garbage collection hint: Automatic garbage collection is triggered when there are a hint: large number of unpacked objects in the repository. Unreachable hint: objects that are more recent than gc.pruneExpire are not hint: pruned. If there are too many of these recent loose hint: objects, automatic garbage collection may be triggered more hint: frequently than necessary. You may run "git prune" now to hint: prune all unreachable objects, regardless of their age. I was tempted to suggest that we find and report the correct "prune" cutoff that would let us avoid auto-gc. I.e., sort the unreachable objects by timestamp and find the cutoff that will drop enough to leave fewer than `gc.auto`. That in theory makes things a bit safer. That's probably not a good idea, though: 1. Telling the user to run `git prune --expire=37.minutes.ago` is just going to confuse them. We could hide it behind a command line option like `git prune --expire-auto-gc` or something, though. 2. Now that we try to keep recent chunks, the analysis isn't quite so easy. You may have a single recent commit that references a ton of old history, and only dropping that commit would help. So the analysis is harder than a simple sort-and-cutoff, but it also means that the prune times are likely to skew close to "now". 3. If we just show them how to prune the minimal amount, then they're likely to just hit this message again soon. So that's probably a dead end. To be honest, the fact that we have to write this warning at all is a sign that Git is not doing a very good job. The best place to spend effort would be to teach git-gc to pack all of the unreachable objects into a single "cruft" pack, so this problem doesn't happen at all (and it's way more efficient, as well). The big problem with that approach is that we lose individual-object timestamps. Each object just gets the timestamp of its surrounding pack, so as we continually ran auto-gc, the cruft-pack timestamp would get updated and we'd never drop objects. So we'd need some auxiliary file (e.g., pack-1234abcd.times) that stores the per-object timestamps. This can be as small as a 32- or 64-bit int per object, since we can just index it by the existing object list in the pack .idx. The trickiest part would be when an object's timestamp gets freshened (because somebody tried to write it again but we optimized out the write). Updating the timestamps in the .times file would probably work atomically, though we usually avoid writing in the middle of a file (we certainly can't portably do so via mmap, and I can't think of another case where we do seeked writes). It might be sufficient for objects in the cruft pack to just do the actual loose object write instead of optimizing it out. It should happen very rarely. -Peff