On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote: > This patch series implements the corrected commit date offsets as generation > number v2, along with other pre-requisites. > > Git uses topological levels in the commit-graph file for commit-graph > traversal operations like git log --graph. Unfortunately, using topological > levels can result in a worse performance than without them when compared > with committer date as a heuristics. For example, git merge-base v4.8 v4.9 > on the Linux repository walks 635,579 commits using topological levels and > walks 167,468 using committer date. > > Thus, the need for generation number v2 was born. New generation number > needed to provide good performance, increment updates, and backward > compatibility. Due to an unfortunate problem, we also needed a way to > distinguish between the old and new generation number without incrementing > graph version. > > Various candidates were examined (https://github.com/derrickstolee/gen-test, > https://github.com/abhishekkumar2718/git/pull/1). The proposed generation > number v2, Corrected Commit Date with Mononotically Increasing Offsets > performed much worse than committer date (506,577 vs. 167,468 commits walked > for git merge-base v4.8 v4.9) and was dropped. > > Using Generation Data chunk (GDAT) relieves the requirement of backward > compatibility as we would continue to store topological levels in Commit > Data (CDAT) chunk. Thus, Corrected Commit Date was chosen as generation > number v2. The Corrected Commit Date is defined as: > > For a commit C, let its corrected commit date be the maximum of the commit > date of C and the corrected commit dates of its parents. Then corrected > commit date offset is the difference between corrected commit date of C and > commit date of C. > > We will introduce an additional commit-graph chunk, Generation Data chunk, > and store corrected commit date offsets in GDAT chunk while storing > topological levels in CDAT chunk. The old versions of Git would ignore GDAT > chunk, using topological levels from CDAT chunk. In contrast, new versions > of Git would use corrected commit dates, falling back to topological level > if the generation data chunk is absent in the commit-graph file. > > Thanks to Dr. Stolee, Dr. Narębski, and Taylor for their reviews on the > first version. > > I look forward to everyone's reviews! > > Thanks > > * Abhishek > > > ---------------------------------------------------------------------------- > > Changes in version 2: > > * Add tests for generation data chunk. > * Add an option GIT_TEST_COMMIT_GRAPH_NO_GDAT to control whether to write > generation data chunk. > * Compare commits with corrected commit dates if present in > paint_down_to_common(). > * Update technical documentation. > * Handle mixed graph version commit chains. > * Improve commit messages for > * Revert unnecessary whitespace changes. > * Split uint_32 -> timestamp_t change into a new commit. This version looks to be in really good shape, for the most part. I feel the need to point out that this is a very mature submission, and it covers many of the bases we expect from a quality series, even at only the second version. My comments deal with some high-level "patch series strategy" considerations, mostly. There are a few concerns about correctness in the complicated cases of mixed commit-graphs with or without the GDAT chunk, and how we could perhaps test those cases. The best way would be to update the 'verify' subcommand to check the GDAT chunk. That's a good feature to have, but would be unnecessary bloat to this series. Depending on timing, we might want to hold this series in 'next' until such an implementation is submitted. (I expect that Abhishek's GSoC internship would be over at that point, so I volunteer to send those patches after this series stabilizes.) Thank you for your careful work on such a complicated subject! Thanks, -Stolee