[PATCH 5/7] commit-graph: document file format v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Derrick Stolee <derrickstolee@xxxxxxxxxx>

The corrected commit date was first documented in 5a3b130ca (doc: add
corrected commit date info, 2021-01-16) and it used an optional chunk to
augment the commit-graph format without modifying the file format
version.

One major benefit to this approach is that corrected commit dates could
be written without causing a backwards compatibility issue with Git
versions that do not understand them. The topological level was still
available in the CDAT chunk as it was before.

However, this causes a different issue: more data needs to be loaded
from disk when parsing commits from the commit-graph. In cases where
there is no significant algorithmic gain from using corrected commit
dates, commit walks take up to 20% longer because of this extra data.

Create a new file format version for the commit-graph format that
differs only in the CDAT chunk: it now stores corrected commit date
offsets. This brings our data back to normal and will demonstrate
performance gains in almost all cases.

Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
---
 .../technical/commit-graph-format.txt         | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 87971c27dd7..2cb48993314 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -36,7 +36,7 @@ HEADER:
       The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-      Currently, the only valid version is 1.
+      This version number can be 1 or 2.
 
   1-byte Hash Version
       We infer the hash length (H) from this value:
@@ -85,13 +85,22 @@ CHUNK DATA:
       position. If there are more than two parents, the second value
       has its most-significant bit on and the other bits store an array
       position into the Extra Edge List chunk.
-    * The next 8 bytes store the topological level (generation number v1)
-      of the commit and
-      the commit time in seconds since EPOCH. The generation number
-      uses the higher 30 bits of the first 4 bytes, while the commit
+    * The next 8 bytes store the generation number information of the
+      commit and the commit time in seconds since EPOCH. The generation
+      number uses the higher 30 bits of the first 4 bytes, while the commit
       time uses the 32 bits of the second 4 bytes, along with the lowest
       2 bits of the lowest byte, storing the 33rd and 34th bit of the
       commit time.
+      - If the commit-graph file format is version 1, then the higher 30
+	bits contain the topological level (generation number v1) for the
+	commit.
+      - If the commit-graph file format is version 2, then the higher 30
+	bits contain the corrected commit date offset (generation number
+	v2) for the commit, except if the offset cannot be stored within
+	29 bits. If the offset is too large for 29 bits, then the value
+	stored here has its most-significant bit on and the other bits
+	store the position of the corrected commit date in the Generation
+	Date Overflow chunk.
 
   Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
     * This list of 4-byte values store corrected commit date offsets for the
@@ -103,6 +112,9 @@ CHUNK DATA:
     * Generation Data chunk is present only when commit-graph file is written
       by compatible versions of Git and in case of split commit-graph chains,
       the topmost layer also has Generation Data chunk.
+    * This chunk does not exist if the commit-graph file format version is 2,
+      because the corrected commit date offset data is stored in the Commit
+      Data chunk.
 
   Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
     * This list of 8-byte values stores the corrected commit date offsets
-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux