From: Derrick Stolee <derrickstolee@xxxxxxxxxx> The corrected commit date was first documented in 5a3b130ca (doc: add corrected commit date info, 2021-01-16) and it used an optional chunk to augment the commit-graph format without modifying the file format version. One major benefit to this approach is that corrected commit dates could be written without causing a backwards compatibility issue with Git versions that do not understand them. The topological level was still available in the CDAT chunk as it was before. However, this causes a different issue: more data needs to be loaded from disk when parsing commits from the commit-graph. In cases where there is no significant algorithmic gain from using corrected commit dates, commit walks take up to 20% longer because of this extra data. Create a new file format version for the commit-graph format that differs only in the CDAT chunk: it now stores corrected commit date offsets. This brings our data back to normal and will demonstrate performance gains in almost all cases. Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> --- .../technical/commit-graph-format.txt | 22 ++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 87971c27dd7..2cb48993314 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -36,7 +36,7 @@ HEADER: The signature is: {'C', 'G', 'P', 'H'} 1-byte version number: - Currently, the only valid version is 1. + This version number can be 1 or 2. 1-byte Hash Version We infer the hash length (H) from this value: @@ -85,13 +85,22 @@ CHUNK DATA: position. If there are more than two parents, the second value has its most-significant bit on and the other bits store an array position into the Extra Edge List chunk. - * The next 8 bytes store the topological level (generation number v1) - of the commit and - the commit time in seconds since EPOCH. The generation number - uses the higher 30 bits of the first 4 bytes, while the commit + * The next 8 bytes store the generation number information of the + commit and the commit time in seconds since EPOCH. The generation + number uses the higher 30 bits of the first 4 bytes, while the commit time uses the 32 bits of the second 4 bytes, along with the lowest 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time. + - If the commit-graph file format is version 1, then the higher 30 + bits contain the topological level (generation number v1) for the + commit. + - If the commit-graph file format is version 2, then the higher 30 + bits contain the corrected commit date offset (generation number + v2) for the commit, except if the offset cannot be stored within + 29 bits. If the offset is too large for 29 bits, then the value + stored here has its most-significant bit on and the other bits + store the position of the corrected commit date in the Generation + Date Overflow chunk. Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional] * This list of 4-byte values store corrected commit date offsets for the @@ -103,6 +112,9 @@ CHUNK DATA: * Generation Data chunk is present only when commit-graph file is written by compatible versions of Git and in case of split commit-graph chains, the topmost layer also has Generation Data chunk. + * This chunk does not exist if the commit-graph file format version is 2, + because the corrected commit date offset data is stored in the Commit + Data chunk. Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional] * This list of 8-byte values stores the corrected commit date offsets -- gitgitgadget