On Wed, Oct 03 2018, Jeff King wrote: > On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote: > >> I share these concerns in a slightly more abstract way, as >> I would bucket the actions into two separate bins: >> >> One bin that throws away information. >> this would include removing expired reflog entries (which >> I do not think are garbage, or collection thereof), but their >> usefulness is questionable. >> >> The other bin would be actions that optimize but >> do not throw away any information, repacking (without >> dropping files) would be part of it, or the new >> "write additional files". >> >> Maybe we can move all actions of the second bin into a new >> "git optimize" command, and git gc would do first the "throw away >> things" and then the optimize action, whereas clone would only >> go for the second optimizing part? > > One problem with that world-view is that some of the operations do > _both_, for efficiency. E.g., repacking will drop unreachable objects in > too-old packs. We could actually be more aggressive in combining things > here. For instance, a full object graph walk in linux.git takes 30-60 > seconds, depending on your CPU. But we do it at least twice during a gc: > once to repack, and then again to determine reachability for pruning. > > If you generate bitmaps during the repack step, you can use them during > the prune step. But by itself, the cost of generating the bitmaps > generally outweighs the extra walk. So it's not worth generating them > _just_ for this (but is an obvious optimization for a server which would > be generating them anyway). I don't mean to fan the flames of this obviously controversial "git gc does optimization" topic (which I didn't suspect there would be a debate about...), but a related thing I was wondering about the other day is whether we could have a gc.fsck option, and in the background do fsck while we were at it, and report this back via some facility like gc.log[1]. That would also fall into this category of more work we could do while we're doing a full walk anyway, but as with what you're suggesting would require some refactoring. 1. Well, one that doesn't suck, see https://public-inbox.org/git/87inc89j38.fsf@xxxxxxxxxxxxxxxxxxx/ / https://public-inbox.org/git/87d0vmck55.fsf@xxxxxxxxxxxxxxxxxxx/ etc.