On 7/28/2020 5:13 AM, Abhishek Kumar via GitGitGadget wrote: > This patch series implements the corrected commit date offsets as generation > number v2, along with other pre-requisites. > > Git uses topological levels in the commit-graph file for commit-graph > traversal operations like git log --graph. Unfortunately, using topological > levels can result in a worse performance than without them when compared > with committer date as a heuristics. For example, git merge-base v4.8 v4.9 > on the Linux repository walks 635,579 commits using topological levels and > walks 167,468 using committer date. > > Thus, the need for generation number v2 was born. New generation number > needed to provide good performance, increment updates, and backward > compatibility. Due to an unfortunate problem, we also needed a way to > distinguish between the old and new generation number without incrementing > graph version. > > Various candidates were examined (https://github.com/derrickstolee/gen-test, > https://github.com/abhishekkumar2718/git/pull/1). The proposed generation > number v2, Corrected Commit Date with Mononotically Increasing Offsets > performed much worse than committer date (506,577 vs. 167,468 commits walked > for git merge-base v4.8 v4.9) and was dropped. > > Using Generation Data chunk (GDAT) relieves the requirement of backward > compatibility as we would continue to store topological levels in Commit > Data (CDAT) chunk. Thus, Corrected Commit Date was chosen as generation > number v2. The Corrected Commit Date is defined as: > > For a commit C, let its corrected commit date be the maximum of the commit > date of C and the corrected commit dates of its parents. Then corrected > commit date offset is the difference between corrected commit date of C and > commit date of C. > > We will introduce an additional commit-graph chunk, Generation Data chunk, > and store corrected commit date offsets in GDAT chunk while storing > topological levels in CDAT chunk. The old versions of Git would ignore GDAT > chunk, using topological levels from CDAT chunk. In contrast, new versions > of Git would use corrected commit dates, falling back to topological level > if the generation data chunk is absent in the commit-graph file. > > Here's what left for the PR (which I intend to take on with the second > version of pull request): > > 1. Add an option to skip writing generation data chunk (to test whether new > Git works without GDAT as intended). This would be a good idea, if only as a GIT_TEST_* environment variable. I think it important we have a test for the compatibility scenario where we have an "old" commit-graph with the new code and test that reading and writing still works properly. > 2. Handle writing to commit-graph for mismatched version (that is, merging > all graphs into a new graph with a GDAT chunk). This is an excellent thing to do. There are a few options when writing an incremental commit-graph when the base graphs do not have the GDAT chunk: i. Do not write the GDAT chunk unless we are merging all levels (based on the merging strategy). ii. Merge all levels, then write the GDAT chunk. > 3. Update technical documentation. Yes, I was going to ask for a patch that updates Documentation/technical/commit-graph-format.txt. This is an excellent v1. A lot of small things, but no really big issues. Thanks, -Stolee