One of the remaining pre-requisites for implementing generation number v2 was distinguishing between corrected commit dates with monotonically increasing offsets and topological level without incrementing generation number version. Two approaches were proposed [1]: 1. New chunk for commit data (generation data chunk, "GDAT") 2. Metadata/versioning chunk Since both approaches have their advantages and disadvantages, I wrote up a prototype [2] to investigate their performance. [1]: https://lore.kernel.org/git/86mu87qj92.fsf@xxxxxxxxx/ [2]: https://github.com/abhishekkumar2718/git/pull/1 TL;DR: I recommend we should use generation data chunk approach. Generation Data Chunk ===================== We could move the generation number v2 into a separate chunk, storing topological levels in CDAT and the corrected commit date into a new, "GDAT" chunk. Thus, old Git would use generation number v1, and new Git would use corrected commit dates from GDAT. Using generation data chunk has the advantage that we would no longer be restricted to using 30 bits for generation number. It also works well for commit-graph chains with a mix of v1 and v2 generation numbers. However, it increases the time required for I/O as commit data and generation numbers are no longer contiguous. Note: While it also increases disk space required for storing commit-graph files by 8 bytes per commit, I don't consider it relevant, especially on modern systems. A repo of the size of Linux repo would be larger by a mere 7.2 Mb. Metadata / Versioning Chunk =========================== We could also introduce an optional metadata chunk to store generation number version and store corrected date offsets in CDAT. Since the offsets are backward compatible, Old Git would still yield correct results by assuming the offsets to be topological levels. New Git would correctly use the offsets to create corrected commit dates. It works just as well as generation number v1 in parsing and writing commit-graph files. However, the generation numbers are still restricted to 30 bits in CDAT chunk and it does not work well with commit-graph chains with a mix of v1 and v2 generation numbers. Performance =========== | Command | Master | Metadata | Generation Data | |--------------------------------|--------|----------|-----------------| | git commit-graph write | 14.45s | 14.28s | 14.63s | | git log --topo-order -10000 | 0.211s | 0.206s | 0.208s | | git log --topo-order -100 A..B | 0.019s | 0.015s | 0.015s | | git merge-base A..B | 0.137s | 0.137s | 0.137s | - Metadata and generation data chunks perform better than master on using commit-graph files since they use corrected commit dates. - The increased I/O time for parsing GDAT does not affect performance as much as expected. - Generation data commit-graph takes longer to write since more information is written into the file. As using the commit-graph is much more frequent than writing, we can consider both approaches to perform equally well. I prefer generation data chunk approach as it also removes 30-bit length restriction on generation numbers. Thanks Abhishek