On Fri, Mar 20, 2020 at 09:44:23PM -0600, Taylor Blau wrote: > In my testing environment, this improves the time to "merge" a split > commit-graph containing all reachable commits in the kernel by > re-writing the same commit-graph (effectively measuring the time it > takes to check that all of those commits still exist) from: > > Attempt 1: 9.614 > Attempt 2: 10.984 > Attempt 3: 10.39 > Attempt 4: 9.14 > Attempt 5: 9.439 > > real 0m9.140s > user 0m8.207s > sys 0m0.602s > > to: > > Attempt 1: 9.12 > Attempt 2: 8.904 > Attempt 3: 9.361 > Attempt 4: 9.288 > Attempt 5: 9.677 > > real 0m8.904s > user 0m8.208s > sys 0m0.596s > > yielding a modest ~2.6% improvement in the best timings from each run, > and ~7.4% improvement on average. That still seems really slow to me. If we were truly eliding the load of most of the commit objects, I'd expect an order of magnitude or so improvement. For example, with a fully constructed commit graph in linux.git, I get: $ time git -c core.commitGraph=1 rev-list HEAD | wc -l 886922 real 0m1.088s user 0m0.659s sys 0m1.161s $ time git -c core.commitGraph=0 rev-list HEAD | wc -l 886922 real 0m7.185s user 0m6.729s sys 0m1.882s Obviously not the same operation, but that should give us a rough idea that commit graph lookups are 6-7 times cheaper than loading the actual objects. I don't remember the details of the case that originally led us towards this patch. Can you share more of the setup you used to generate the numbers above (which repo, but more importantly the commands to create the initial state and then time the test). The patch otherwise still makes sense to me, but I suspect there are other similar optimizations nearby that we'll need to do in tandem. -Peff