On Fri, Oct 05 2018, Jeff King wrote: > On Fri, Oct 05, 2018 at 03:41:40PM -0400, Derrick Stolee wrote: > >> > So can we really just take (total_objects - commit_graph_objects) and >> > compare it to some threshold? >> >> The commit-graph only stores the number of _commits_, not total objects. > > Oh, right, of course. That does throw a monkey wrench in that line of > thought. ;) > > There's unfortunately not a fast way of doing that. One option would be > to keep a counter of "ungraphed commit objects", and have callers update > it. Anybody admitting a pack via index-pack or unpack-objects can easily > get this information. Commands like fast-import can do likewise, and > "git commit" obviously increments it by one. > > I'm not excited about adding a new global on-disk data structure (and > the accompanying lock). You don't really need a new global datastructure to solve this problem. It would be sufficient to have git-gc itself write out a 4-line text file after it runs saying how many tags, commits, trees and blobs it found on its last run. You can then fuzzily compare object counts v.s. commit counts for the purposes of deciding whether something like the commit-graph needs to be updated, while assuming that whatever new data you have has similar enough ratios of those as your existing data. That's an assumption that'll hold well enough for big repos where this matters the most, and who tend to grow in fairly uniform ways as far as their object type ratios go. Databases like MySQL, PostgreSQL etc. pull similar tricks with their fuzzy table statistics.