On Fri, Oct 05, 2018 at 10:01:31PM +0200, Ævar Arnfjörð Bjarmason wrote: > > There's unfortunately not a fast way of doing that. One option would be > > to keep a counter of "ungraphed commit objects", and have callers update > > it. Anybody admitting a pack via index-pack or unpack-objects can easily > > get this information. Commands like fast-import can do likewise, and > > "git commit" obviously increments it by one. > > > > I'm not excited about adding a new global on-disk data structure (and > > the accompanying lock). > > You don't really need a new global datastructure to solve this > problem. It would be sufficient to have git-gc itself write out a 4-line > text file after it runs saying how many tags, commits, trees and blobs > it found on its last run. > > You can then fuzzily compare object counts v.s. commit counts for the > purposes of deciding whether something like the commit-graph needs to be > updated, while assuming that whatever new data you have has similar > enough ratios of those as your existing data. I think this is basically the same thing as Stolee's suggestion to keep the total object count in the commit-graph file. The only difference is here is that we know the actual ratio of commit to blobs for this particular repository. But I don't think we need to know that. As you said, this is fuzzy anyway, so a single number for "update the graph when there are N new objects" is likely enough. If you had a repository with an unusually large tree, you'd end up rebuilding the graph more often. But I think it would probably be OK, as we're primarily trying not to waste time doing a graph rebuild when we've only done a small amount of other work. But if we just shoved a ton of objects through index-pack then we did a lot of work, whether those were commit objects or not. -Peff