On Sun, Oct 25, 2020 at 01:16:48AM +0200, Jakub Narębski wrote: > "Abhishek Kumar via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > While measuring performance with `git commit-graph write --reachable > > --changed-paths` on the linux repository led to around 1m40s for both > > HEAD and master (and could be due to fault in my measurements), it is > > still the "right" thing to do. > > I had to read the above paragraph several times to understand it, > possibly because I have expected here to be a fix for a performance > regression. The commit message for 3d112755 (commit-graph: examine > commits by generation number) describes reduction of computation time > from 3m00s to 1m37s. So I would expect performance with HEAD (i.e. > before those changes) to be around 3m, not the same before and after > changes being around 1m40s. > > Can anyone recheck this before-and-after benchmark, please? My hunch is that our heuristic to fall back to the commits 'date' value is saving us here. commit_gen_cmp() first compares the generation numbers, breaking ties by 'date' as a heuristic. But since all generation number queries return GENERATION_NUMBER_INFINITY during writing, we're relying on our heuristic entirely. I haven't looked much further than that, other than to see that I could get about a ~4sec speed-up with this patch as compared to v2.29.1 in the computing Bloom filters region on the kernel. > Anyway, it might be more clear to write it as the following: > > On the Linux kernel repository, this patch didn't reduce the > computation time for 'git commit-graph write --reachable > --changed-paths', which is around 1m40s both before and after this > change. This could be a fault in my measurements; it is still the > "right" thing to do. > > Or something like that. Assuming that we are in fact being saved by the "date" heuristic, I'd probably write the following commit message instead: Before computing Bloom filters, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since c49c82aa4c (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_graph_generation()' falls back to comparing commits by their date when they have equal generation number, and so since c49c82aa4c is purely a date comparison function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. [Benchmark that we loose about ~4sec before/after c49c82aa4c9...] So, avoid the uesless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. Thanks, Taylor