The v3 pack index file as documented has a lot of complexity making it difficult to implement correctly. I worked with bryan's preliminary implementation and it took several passes to get the bugs out. The complexity also requires multiple table look-ups to find all of the information that is needed to translate from one kind of oid to another. Which can't be good for cache locality. Even worse coming up with a new index file version requires making changes that have the potentialy to break anything that uses the index of a pack file. Instead of continuing to deal with the chance of braking things besides the oid mapping functionality, the additional complexity in the file format, and worry if the performance would be reasonable I stripped down the problem to it's fundamental complexity and came up with a file format that is exactly about mapping one kind of oid to another, and only supports two kinds of oids. Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> --- .../technical/hash-function-transition.txt | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index ed574810891c..4b937480848a 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -209,6 +209,46 @@ format described in linkgit:gitformat-pack[5], just like today. The content that is compressed and stored uses SHA-256 content instead of SHA-1 content. +Per Pack Mapping Table +~~~~~~~~~~~~~~~~~~~~~~ +A pack compat map file (.compat) files have the following format: + +HEADER: + 4-byte signature: + The signature is: {'C', 'M', 'A', 'P'} + 1-byte version number: + Git only writes or recognizes version 1. + 1-byte First Object Id Version + We infer the length of object IDs (OIDs) from this value: + 1 => SHA-1 + 2 => SHA-256 + 1-byte Second Object Id Version + We infer the length of object IDs (OIDs) from this value: + 1 => SHA-1 + 2 => SHA-256 + 1-byte reserved (must be zero) + 4-byte number of objects names contained in this mapping + 1-byte length in bytes of shorted object names for the first object id. + This is the shortest possible length needed to make the + first object names unambigious. + 1-byte reserved (must be zero) + 1-byte length in bytes of shorted object names for the second object id. + This is the shortest possible length needed to make the + second object names unambigious. + 1-byte reserved (must be zero) + +OBJECT NAME TABLES: + [Object name raw length + 4]*Number of object names + This table is sorted by object name + Each entry in the table is formated as: + [20 or 32 byte] Object name + 4-byte index into the other object name table + +TRAILER: + checksum of the corresponding packfile, and + + checksum of all of the above. + Pack index ~~~~~~~~~~ Pack index (.idx) files use a new v3 format that supports multiple -- 2.41.0