On Tue, Dec 29, 2020 at 11:35:56PM -0500, Derrick Stolee wrote: > On 12/28/2020 6:15 AM, Abhishek Kumar via GitGitGadget wrote: > > This patch series implements the corrected commit date offsets as generation > > number v2, along with other pre-requisites. > > Abhishek, > > Thank you for this version. I appreciate your hard work on this topic, > especially after GSoC ended and you returned to being a full-time student. > > My hope was that I could completely approve this series and only provide > forward-fixes from here on out, as necessary. I think there are a few minor > typos that you might want to address, but I was also able to understand your > intention. > > I did make a particular case about a SEGFAULT I hit that I have been unable > to replicate. I saw it both in my copy of torvalds/linux and of > chromium/chromium. I have the file for chromium/chromium that is in a bad > state where a GDAT value includes the bit saying it should be in the long > offsets chunk, but that chunk doesn't exist. Further, that chunk doesn't > exist in a from-scratch write. I hope validating mixed generation chain while writing as well was enough to fix the SEGFAULT. > > I'm now taking backups of my existing commit-graph files before any later > test, but it doesn't repro for my Git repository or any other repo I try on > purpose. > > However, I did some performance testing to double-check your numbers. I sent > a patch [1] that helps with some of the hard numbers. > > [1] https://lore.kernel.org/git/pull.828.git.1609302714183.gitgitgadget@xxxxxxxxx/ > > The big question is whether the overhead from using a slab to store the > generation values is worth it. I still think it is, for these reasons: > > 1. Generation number v2 is measurably better than v1 in most user cases. > > 2. Generation number v2 is slower than using committer date due to the > overhead, but _guarantees correctness_. > > I like to use "git log --graph -<N>" to compare against topological levels > (v1), for various levels of <N>. When <N> is small, we hope to minimize > the amount we need to walk using the extra commit-date information as an > assistance. Repos like git/git and torvalds/linux use the philosophy of > "base your changes on oldest applicable commit" enough that v1 struggles > sometimes. > > git/git: N=1000 > > Benchmark #1: baseline > Time (mean ± σ): 100.3 ms ± 4.2 ms [User: 89.0 ms, System: 11.3 ms] > Range (min … max): 94.5 ms … 105.1 ms 28 runs > > Benchmark #2: test > Time (mean ± σ): 35.8 ms ± 3.1 ms [User: 29.6 ms, System: 6.2 ms] > Range (min … max): 29.8 ms … 40.6 ms 81 runs > > Summary > 'test' ran > 2.80 ± 0.27 times faster than 'baseline' > > This is a dramatic improvement! Using my topo-walk stats commit, I see that > v1 walks 58,805 commits as part of the in-degree walk while v2 only walks > 4,335 commits! > > torvalds/linux: N=1000 (starting at v5.10) > > Benchmark #1: baseline > Time (mean ± σ): 90.8 ms ± 3.7 ms [User: 75.2 ms, System: 15.6 ms] > Range (min … max): 85.2 ms … 96.2 ms 31 runs > > Benchmark #2: test > Time (mean ± σ): 49.2 ms ± 3.5 ms [User: 36.9 ms, System: 12.3 ms] > Range (min … max): 42.9 ms … 54.0 ms 61 runs > > Summary > 'test' ran > 1.85 ± 0.15 times faster than 'baseline' > > Similarly, v1 walked 38,161 commits compared to 4,340 by v2. > > If I increase N to something like 10,000, then usually these values get > washed out due to the width of the parallel topics. That's not too bad, as large N would be needed rather infrequently. > > The place we were still using commit-date as a heuristic was paint_down_to_common > which caused a regression the first time we used v1, at least for certain cases. > > Specifically, computing the merge-base in torvalds/linux between v4.8 and v4.9 > hit a strangeness about a pair of recent commits both based on a very old commit, > but the generation numbers forced walking farther than necessary. This doesn't > happen with v2, but we see the overhead cost of the slabs: > > Benchmark #1: baseline > Time (mean ± σ): 112.9 ms ± 2.8 ms [User: 96.5 ms, System: 16.3 ms] > Range (min … max): 107.7 ms … 118.0 ms 26 runs > > Benchmark #2: test > Time (mean ± σ): 147.1 ms ± 5.2 ms [User: 132.7 ms, System: 14.3 ms] > Range (min … max): 141.4 ms … 162.2 ms 18 runs > > Summary > 'baseline' ran > 1.30 ± 0.06 times faster than 'test' > > The overhead still exists for a more recent pair of versions (v5.0 and v5.1): > > Benchmark #1: baseline > Time (mean ± σ): 25.1 ms ± 3.2 ms [User: 18.6 ms, System: 6.5 ms] > Range (min … max): 19.0 ms … 32.8 ms 99 runs > > Benchmark #2: test > Time (mean ± σ): 33.3 ms ± 3.3 ms [User: 26.5 ms, System: 6.9 ms] > Range (min … max): 27.0 ms … 38.4 ms 105 runs > > Summary > 'baseline' ran > 1.33 ± 0.22 times faster than 'test' > > I still think this overhead is worth it. In case not everyone agrees, it _might_ > be worth a command-line option to skip the GDAT chunk. That also prevents an > ability to eventually wean entirely of generation number v1 and allow the commit > date to take the full 64-bit column (instead of only 34 bits, saving 30 for > topo-levels). Thank you for the detailed benchmarking and discussion. I don't think there is any disagreement on utility of corrected commit dates so far. We will run out of 34-bits for the commit date by the year 2514, so I am not exactly worried about weaning of generation number v1 anytime soon. > > Again, such a modification should not be considered required for this series. > > > ---------------------------------------------------------------------------- > > > > Improvements left for a future series: > > > > * Save commits with generation data overflow and extra edge commits instead > > of looping over all commits. cf. 858sbel67n.fsf@xxxxxxxxx > > * Verify both topological levels and corrected commit dates when present. > > cf. 85pn4tnk8u.fsf@xxxxxxxxx > > These seem like reasonable things to delay for a later series > or for #leftoverbits > > Thanks, > -Stolee > Thanks - Abhishek