[PATCH v3 5/4] commit-graph: declare bankruptcy on GDAT chunks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks
store corrected commit date offsets, used for generation number v2.
Recent changes have demonstrated that previous versions of Git were
incorrectly parsing data from these chunks, but might have also been
writing them incorrectly.

I asserted [1] that the previous fixes were sufficient because the known
reasons for incorrectly writing generation number v2 data relied on
parsing the information incorrectly out of a commit-graph file, but the
previous versions of Git were not reading the generation number v2 data.

However, Patrick demonstrated [2] a case where in split commit-graphs
across an alternate boundary (and possibly some other special
conditions) it was possible to have a commit-graph that was generated by
a previous version of Git have incorrect generation number v2 data which
results in errors like the following:

  commit-graph generation for commit <oid> is 1623273624 < 1623273710

[1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@xxxxxxxxxx/
[2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/

Clearly, there is something else going on. The situation is not
completely understood, but the errors do not reproduce if the
commit-graphs are all generated by a Git version including these recent
fixes.

If we cannot trust the existing data in the GDAT and GDOV chunks, then
we can alter the format to change the chunk IDs for these chunks. This
causes the new version of Git to silently ignore the older chunks (and
disabling generation number v2 in the process) while writing new
commit-graph files with correct data in the GDA2 and GDO2 chunks.

Update commit-graph-format.txt including a historical note about these
deprecated chunks.

Reported-by: Patrick Steinhardt <ps@xxxxxx>
Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
---
 Documentation/technical/commit-graph-format.txt | 12 ++++++++++--
 commit-graph.c                                  |  4 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 87971c27dd7..484b185ba98 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -93,7 +93,7 @@ CHUNK DATA:
       2 bits of the lowest byte, storing the 33rd and 34th bit of the
       commit time.
 
-  Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
+  Generation Data (ID: {'G', 'D', 'A', '2' }) (N * 4 bytes) [Optional]
     * This list of 4-byte values store corrected commit date offsets for the
       commits, arranged in the same order as commit data chunk.
     * If the corrected commit date offset cannot be stored within 31 bits,
@@ -104,7 +104,7 @@ CHUNK DATA:
       by compatible versions of Git and in case of split commit-graph chains,
       the topmost layer also has Generation Data chunk.
 
-  Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
+  Generation Data Overflow (ID: {'G', 'D', 'O', '2' }) [Optional]
     * This list of 8-byte values stores the corrected commit date offsets
       for commits with corrected commit date offsets that cannot be
       stored within 31 bits.
@@ -156,3 +156,11 @@ CHUNK DATA:
 TRAILER:
 
 	H-byte HASH-checksum of all of the above.
+
+== Historical Notes:
+
+The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have
+the number '2' in their chunk IDs because a previous version of Git wrote
+possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By
+changing the IDs, newer versions of Git will silently ignore those older
+chunks and write the new information without trusting the incorrect data.
diff --git a/commit-graph.c b/commit-graph.c
index b86a6a634fe..fb2ced0bd6d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,8 +39,8 @@ void git_test_write_commit_graph_or_die(void)
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
-#define GRAPH_CHUNKID_GENERATION_DATA 0x47444154 /* "GDAT" */
-#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f56 /* "GDOV" */
+#define GRAPH_CHUNKID_GENERATION_DATA 0x47444132 /* "GDA2" */
+#define GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW 0x47444f32 /* "GDO2" */
 #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
 #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */
 #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
-- 
2.35.1.138.gfc5de29e9e6







[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux