> On 10 Jun 2017, at 10:06, Jeff King <peff@xxxxxxxx> wrote: > > On Fri, Jun 09, 2017 at 02:03:18PM +0200, Lars Schneider wrote: > >>> I agree the existing message isn't great. There should probably be a big >>> advise() block explaining what's going on (and that expert users can >>> disable). >> >> How about this? >> >> diff --git a/builtin/gc.c b/builtin/gc.c >> index c2c61a57bb..12ee212544 100644 >> --- a/builtin/gc.c >> +++ b/builtin/gc.c >> @@ -473,9 +473,18 @@ int cmd_gc(int argc, const char **argv, const char *prefix) >> if (pack_garbage.nr > 0) >> clean_pack_garbage(); >> >> - if (auto_gc && too_many_loose_objects()) >> - warning(_("There are too many unreachable loose objects; " >> - "run 'git prune' to remove them.")); >> + if (auto_gc && too_many_loose_objects()) { >> + warning(_("Auto packing did not lead to optimal results as the " >> + "repository contains too many unreachable objects.")); >> + advice(_("Unreachable objects are Git objects (commits, files, ...) " >> + "that are not referenced by any branch or tag. This might happen " >> + "if you use 'git rebase' or if you delete branches. Auto packing " >> + "only prunes unreachable objects that are older than 2 weeks " >> + "(default, overridable by the config variable 'gc.pruneExpire'). " >> + "Please run 'git prune' to prune all unreachable objects for " >> + "optimal repository performance.")); >> + } > > s/advice/advise/, of course. This probably be protected by a new entry > in advice_config[] in advice.c. > > But I assume you are most interested in the text. I think it > simultaneously goes into too much and too little detail. I think the > warning itself should just say _what_ we observed: after garbage > collection, there were still enough objects to trigger a gc. And then > the hint doesn't need to go into the details of why we prune or what > unreachable objects are. Those can be cross-referenced with other > documentation. I think we need to focus on what the warning means, and > whether and how they would correct it. > > Maybe: > > warning: too many loose objects remain after garbage collection > hint: Automatic garbage collection is triggered when there are a > hint: large number of unpacked objects in the repository. Unreachable > hint: objects that are more recent than gc.pruneExpire are not > hint: pruned. If there are too many of these recent loose > hint: objects, automatic garbage collection may be triggered more > hint: frequently than necessary. You may run "git prune" now to > hint: prune all unreachable objects, regardless of their age. > > I was tempted to suggest that we find and report the correct "prune" > cutoff that would let us avoid auto-gc. I.e., sort the unreachable > objects by timestamp and find the cutoff that will drop enough to leave > fewer than `gc.auto`. That in theory makes things a bit safer. That's > probably not a good idea, though: > > 1. Telling the user to run `git prune --expire=37.minutes.ago` is > just going to confuse them. We could hide it behind a command line > option like `git prune --expire-auto-gc` or something, though. > > 2. Now that we try to keep recent chunks, the analysis isn't quite so > easy. You may have a single recent commit that references a ton of > old history, and only dropping that commit would help. So the > analysis is harder than a simple sort-and-cutoff, but it also means > that the prune times are likely to skew close to "now". > > 3. If we just show them how to prune the minimal amount, then they're > likely to just hit this message again soon. > > So that's probably a dead end. > > To be honest, the fact that we have to write this warning at all is a > sign that Git is not doing a very good job. The best place to spend > effort would be to teach git-gc to pack all of the unreachable objects > into a single "cruft" pack, so this problem doesn't happen at all (and > it's way more efficient, as well). > > The big problem with that approach is that we lose individual-object > timestamps. Each object just gets the timestamp of its surrounding pack, > so as we continually ran auto-gc, the cruft-pack timestamp would get > updated and we'd never drop objects. So we'd need some auxiliary file > (e.g., pack-1234abcd.times) that stores the per-object timestamps. This > can be as small as a 32- or 64-bit int per object, since we can just > index it by the existing object list in the pack .idx. Why can't we generate a new cruft-pack on every gc run that detects too many unreachable objects? That would not be as efficient as a single cruft-pack but it should be way more efficient than the individual objects, no? Plus, chances are that the existing cruft-packs are purged with the next gc run anyways. - Lars > The trickiest part would be when an object's timestamp gets freshened > (because somebody tried to write it again but we optimized out the > write). Updating the timestamps in the .times file would probably work > atomically, though we usually avoid writing in the middle of a file (we > certainly can't portably do so via mmap, and I can't think of another > case where we do seeked writes). It might be sufficient for objects in > the cruft pack to just do the actual loose object write instead of > optimizing it out. It should happen very rarely.