On 12/28/2020 6:15 AM, Abhishek Kumar via GitGitGadget wrote: > This patch series implements the corrected commit date offsets as generation > number v2, along with other pre-requisites. Abhishek, Thank you for this version. I appreciate your hard work on this topic, especially after GSoC ended and you returned to being a full-time student. My hope was that I could completely approve this series and only provide forward-fixes from here on out, as necessary. I think there are a few minor typos that you might want to address, but I was also able to understand your intention. I did make a particular case about a SEGFAULT I hit that I have been unable to replicate. I saw it both in my copy of torvalds/linux and of chromium/chromium. I have the file for chromium/chromium that is in a bad state where a GDAT value includes the bit saying it should be in the long offsets chunk, but that chunk doesn't exist. Further, that chunk doesn't exist in a from-scratch write. I'm now taking backups of my existing commit-graph files before any later test, but it doesn't repro for my Git repository or any other repo I try on purpose. However, I did some performance testing to double-check your numbers. I sent a patch [1] that helps with some of the hard numbers. [1] https://lore.kernel.org/git/pull.828.git.1609302714183.gitgitgadget@xxxxxxxxx/ The big question is whether the overhead from using a slab to store the generation values is worth it. I still think it is, for these reasons: 1. Generation number v2 is measurably better than v1 in most user cases. 2. Generation number v2 is slower than using committer date due to the overhead, but _guarantees correctness_. I like to use "git log --graph -<N>" to compare against topological levels (v1), for various levels of <N>. When <N> is small, we hope to minimize the amount we need to walk using the extra commit-date information as an assistance. Repos like git/git and torvalds/linux use the philosophy of "base your changes on oldest applicable commit" enough that v1 struggles sometimes. git/git: N=1000 Benchmark #1: baseline Time (mean ± σ): 100.3 ms ± 4.2 ms [User: 89.0 ms, System: 11.3 ms] Range (min … max): 94.5 ms … 105.1 ms 28 runs Benchmark #2: test Time (mean ± σ): 35.8 ms ± 3.1 ms [User: 29.6 ms, System: 6.2 ms] Range (min … max): 29.8 ms … 40.6 ms 81 runs Summary 'test' ran 2.80 ± 0.27 times faster than 'baseline' This is a dramatic improvement! Using my topo-walk stats commit, I see that v1 walks 58,805 commits as part of the in-degree walk while v2 only walks 4,335 commits! torvalds/linux: N=1000 (starting at v5.10) Benchmark #1: baseline Time (mean ± σ): 90.8 ms ± 3.7 ms [User: 75.2 ms, System: 15.6 ms] Range (min … max): 85.2 ms … 96.2 ms 31 runs Benchmark #2: test Time (mean ± σ): 49.2 ms ± 3.5 ms [User: 36.9 ms, System: 12.3 ms] Range (min … max): 42.9 ms … 54.0 ms 61 runs Summary 'test' ran 1.85 ± 0.15 times faster than 'baseline' Similarly, v1 walked 38,161 commits compared to 4,340 by v2. If I increase N to something like 10,000, then usually these values get washed out due to the width of the parallel topics. The place we were still using commit-date as a heuristic was paint_down_to_common which caused a regression the first time we used v1, at least for certain cases. Specifically, computing the merge-base in torvalds/linux between v4.8 and v4.9 hit a strangeness about a pair of recent commits both based on a very old commit, but the generation numbers forced walking farther than necessary. This doesn't happen with v2, but we see the overhead cost of the slabs: Benchmark #1: baseline Time (mean ± σ): 112.9 ms ± 2.8 ms [User: 96.5 ms, System: 16.3 ms] Range (min … max): 107.7 ms … 118.0 ms 26 runs Benchmark #2: test Time (mean ± σ): 147.1 ms ± 5.2 ms [User: 132.7 ms, System: 14.3 ms] Range (min … max): 141.4 ms … 162.2 ms 18 runs Summary 'baseline' ran 1.30 ± 0.06 times faster than 'test' The overhead still exists for a more recent pair of versions (v5.0 and v5.1): Benchmark #1: baseline Time (mean ± σ): 25.1 ms ± 3.2 ms [User: 18.6 ms, System: 6.5 ms] Range (min … max): 19.0 ms … 32.8 ms 99 runs Benchmark #2: test Time (mean ± σ): 33.3 ms ± 3.3 ms [User: 26.5 ms, System: 6.9 ms] Range (min … max): 27.0 ms … 38.4 ms 105 runs Summary 'baseline' ran 1.33 ± 0.22 times faster than 'test' I still think this overhead is worth it. In case not everyone agrees, it _might_ be worth a command-line option to skip the GDAT chunk. That also prevents an ability to eventually wean entirely of generation number v1 and allow the commit date to take the full 64-bit column (instead of only 34 bits, saving 30 for topo-levels). Again, such a modification should not be considered required for this series. > ---------------------------------------------------------------------------- > > Improvements left for a future series: > > * Save commits with generation data overflow and extra edge commits instead > of looping over all commits. cf. 858sbel67n.fsf@xxxxxxxxx > * Verify both topological levels and corrected commit dates when present. > cf. 85pn4tnk8u.fsf@xxxxxxxxx These seem like reasonable things to delay for a later series or for #leftoverbits Thanks, -Stolee