[PATCH 01/32] doc hash-file-transition: A map file for mapping between sha1 and sha256

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Fri, 8 Sep 2023 18:10:18 -0500

The v3 pack index file as documented has a lot of complexity making it
difficult to implement correctly.  I worked with bryan's preliminary
implementation and it took several passes to get the bugs out.

The complexity also requires multiple table look-ups to find all of
the information that is needed to translate from one kind of oid to
another.  Which can't be good for cache locality.

Even worse coming up with a new index file version requires making
changes that have the potentialy to break anything that uses the index
of a pack file.

Instead of continuing to deal with the chance of braking things
besides the oid mapping functionality, the additional complexity in
the file format, and worry if the performance would be reasonable I
stripped down the problem to it's fundamental complexity and came up
with a file format that is exactly about mapping one kind of oid to
another, and only supports two kinds of oids.

Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
---
 .../technical/hash-function-transition.txt    | 40 +++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index ed574810891c..4b937480848a 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like
 today. The content that is compressed and stored uses SHA-256 content
 instead of SHA-1 content.
 
+Per Pack Mapping Table
+~~~~~~~~~~~~~~~~~~~~~~
+A pack compat map file (.compat) files have the following format:
+
+HEADER:
+	4-byte signature:
+	    The signature is: {'C', 'M', 'A', 'P'}
+	1-byte version number:
+	    Git only writes or recognizes version 1.
+	1-byte First Object Id Version
+	    We infer the length of object IDs (OIDs) from this value:
+		1 => SHA-1
+		2 => SHA-256
+	1-byte Second Object Id Version
+	    We infer the length of object IDs (OIDs) from this value:
+		1 => SHA-1
+		2 => SHA-256
+	1-byte reserved (must be zero)
+	4-byte number of objects names contained in this mapping
+	1-byte length in bytes of shorted object names for the first object id.
+	       This is the shortest possible length needed to make the
+	       first object names unambigious.
+	1-byte reserved (must be zero)
+	1-byte length in bytes of shorted object names for the second object id.
+	       This is the shortest possible length needed to make the
+	       second object names unambigious.
+	1-byte reserved (must be zero)
+
+OBJECT NAME TABLES:
+	[Object name raw length + 4]*Number of object names
+	   This table is sorted by object name
+	   Each entry in the table is formated as:
+		[20 or 32 byte] Object name
+		4-byte index into the other object name table
+
+TRAILER:
+	checksum of the corresponding packfile, and
+
+	checksum of all of the above.
+
 Pack index
 ~~~~~~~~~~
 Pack index (.idx) files use a new v3 format that supports multiple
-- 
2.41.0