On 6/28/2018 1:11 AM, Grant Welch wrote:
I recently read the "Supercharging the Git Commit Graph blog by Derrick Stolee. I found the article interesting and wanted to verify the performance numbers for myself. Then that led me to want to know more about the implementation, so I read the technical design notes in commit-graph.txt, and then I jumped into the format documentation in commit-graph-format.txt. Along the way, I noticed a few issues. They might just be errors in the documentation, but I figured it was worth documenting my entire process just to be sure. "Supercharging the Git Commit Graph", by Derrick Stolee: https://blogs.msdn.microsoft.com/devops/2018/06/25/supercharging-the-git-commit-graph/ # TL;DR I found a few discrepencies between the documentation in commit-graph-format.txt and the results that I observed for myself. 1. The "Commit Data" chunk signature is documented to be 'CGET', but it should be 'CDAT'. commit-graph.c:18 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
Thanks for catching this! Thankfully, this is an easy fix, as we only need to update the documentation.
2. The "no parent" value is documented to be 0xffffffff, but is actually 0x70000000. commit-graph.c:34 #define GRAPH_PARENT_NONE 0x70000000
This is a more important mistake, as it affects the data that was written in the commit-graph file.
Part of the problem is that leading hex digit of 0x7 which in binary is 0b0111. We already designed a limit of at most 2^{31}-1 (~2.1 billion) commits in the commit-graph because of the way we track octopus edges, but this mistake has cost us more: we cannot store more than ~1.8 billion commits.
I'm sorry for this mixup, mostly because it is aesthetically unpleasant. Those extra 300 million commits mean less to me than having a clean file format.
3. The "generation" field isn't set on any of the commits. (I don't know what to make of this.)
This is a difference between 2.18 and current 'master', which merged ds/generation-numbers. Commit-graphs written with Git 2.18 have all generation numbers listed as GENERATION_NUMBER_ZERO (0), which lets future versions know that the generation number is not computed yet, so the next commit-graph write will compute the correct generation number.
I'll send a patch soon fixing these doc issues. Thanks, -Stolee