On 2/1/2021 1:58 AM, Abhishek Kumar via GitGitGadget wrote: > Changes in version 7: > > * Moved the documentation patch ahead of "commit-graph: implement corrected > commit date" and elaborated on the introduction of generation number v2. The only change in this version is this commit message: > 11: e571f03d8bd ! 7: 8647b5d2e38 doc: add corrected commit date info > @@ Metadata > Author: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > ## Commit message ## > - doc: add corrected commit date info > + commit-graph: document generation number v2 > > - With generation data chunk and corrected commit dates implemented, let's > - update the technical documentation for commit-graph. > + Git uses topological levels in the commit-graph file for commit-graph > + traversal operations like 'git log --graph'. Unfortunately, topological > + levels can perform worse than committer date when parents of a commit > + differ greatly in generation numbers [1]. For example, 'git merge-base > + v4.8 v4.9' on the Linux repository walks 635,579 commits using > + topological levels and walks 167,468 using committer date. Since > + 091f4cf3 (commit: don't use generation numbers if not needed, > + 2018-08-30), 'git merge-base' uses committer date heuristic unless there > + is a cutoff because of the performance hit. > + > + [1] https://lore.kernel.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@xxxxxxxxx/ > + > + Thus, the need for generation number v2 was born. As Git used to die > + when graph version understood by it and in the commit-graph file are > + different [2], we needed a way to distinguish between the old and new > + generation number without incrementing the graph version. > + > + [2] https://lore.kernel.org/git/87a7gdspo4.fsf@xxxxxxxxxxxxxxxxxxx/ > + > + The following candidates were proposed (https://github.com/derrickstolee/gen-test, > + https://github.com/abhishekkumar2718/git/pull/1): > + - (Epoch, Date) Pairs. > + - Maximum Generation Numbers. > + - Corrected Commit Date. > + - FELINE Index. > + - Corrected Commit Date with Monotonically Increasing Offsets. > + > + Based on performance, local computability, and immutability (along with > + the introduction of an additional commit-graph chunk which relieved the > + requirement of backwards-compatibility) Corrected Commit Date was chosen > + as generation number v2 and is defined as follows: > + > + For a commit C, let its corrected commit date be the maximum of the > + commit date of C and the corrected commit dates of its parents plus 1. > + Then corrected commit date offset is the difference between corrected > + commit date of C and commit date of C. As a special case, a root commit > + with the timestamp zero has corrected commit date of 1 to distinguish it > + from GENERATION_NUMBER_ZERO (that is, an uncomputed generation number). > + > + While it was proposed initially to store corrected commit date offsets > + within Commit Data Chunk, storing the offsets in a new chunk did not > + affect the performance measurably. The new chunk is "Generation DATa > + (GDAT) chunk" and it stores corrected commit date offsets while CDAT > + chunk stores topological level. The old versions of Git would ignore > + GDAT chunk, using topological levels from CDAT chunk. In contrast, new > + versions of Git would use corrected commit dates, falling back to > + topological level if the generation data chunk is absent in the > + commit-graph file. > + > + While storing corrected commit date offsets saves us 4 bytes per commit > + (as compared with storing corrected commit dates directly), it's however > + possible for the offset to overflow the space allocated. To handle such > + cases, we introduce a new chunk, _Generation Data Overflow_ (GDOV) that > + stores the corrected commit date. For overflowing offsets, we set MSB > + and store the position into the GDOV chunk, in a mechanism similar to > + the Extra Edges list chunk. > + > + For mixed generation number environment (for example new Git on the > + command line, old Git used by GUI client), we can encounter a > + mixed-chain commit-graph (a commit-graph chain where some of split > + commit-graph files have GDAT chunk and others do not). As backward > + compatibility is one of the goals, we can define the following behavior: > + > + While reading a mixed-chain commit-graph version, we fall back on > + topological levels as corrected commit dates and topological levels > + cannot be compared directly. > + > + When adding new layer to the split commit-graph file, and when merging > + some or all layers (replacing them in the latter case), the new layer > + will have GDAT chunk if and only if in the final result there would be > + no layer without GDAT chunk just below it. While that is a quality message, v6 has landed in 'next' and I've begun working off of that version. As Taylor attempted to say [1], this topic should be considered final and updates should be follow-ups on top. [1] https://lore.kernel.org/git/YBYLwpKdUfxCNwaz@nand.local/ (Of course, if Junio says differently, then listen to him.) Thanks, -Stolee