On Fri, 6 Sep 2013, Nguyễn Thái Ngọc Duy wrote: > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> > --- > Should be up to date with Nico's latest implementation and also cover > additions to the format that everybody seems to agree on: > > - new types for canonical trees and commits > - sha-1 table covering missing objects in thin packs Great! I've merged this into my branch with the following amendment: diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 1980794..d0c2cde 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -81,6 +81,13 @@ Git pack format completing thin packs or preserving somewhat ill-formatted objects. + Thin packs are used for transferring on the wire and may omit delta + base objects, expecting the receiver to add them at the end of the pack + before writing to disk. The number of objects contained in the pack + header must account for those omitted objects in any case. To indicate + no more objects are included in a thin pack, a "type 0" byte + indicator is inserted after the last transmitted object. + - The trailer records 20-byte SHA-1 checksum of all of the above. === Pack v4 tables @@ -88,10 +95,7 @@ Git pack format - A table of sorted SHA-1 object names for all objects contained in the on-disk pack. - Thin packs are used for transferring on the wire and may omit base - objects, expecting the receiver to add them before writing to - disk. The SHA-1 table in thin packs must include the omitted objects - as well. + The SHA-1 table in thin packs must include the omitted objects as well. This table can be referred to using "SHA-1 reference encoding": the index, in variable length encoding, to this table. @@ -158,7 +162,7 @@ Git pack format entry (LSB not set), or an instruction to copy tree entries from another tree (LSB set). - For copying from another tree, is the LSB of the second number is + For copying from another tree, if the LSB of the second number is set, it will be followed by a base tree SHA-1. If it's not set, the last base tree will be used. > diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt > index 8e5bf60..c5327ff 100644 > --- a/Documentation/technical/pack-format.txt > +++ b/Documentation/technical/pack-format.txt > @@ -1,7 +1,7 @@ > Git pack format > =============== > > -== pack-*.pack files have the following format: > +== pack-*.pack files version 2 and 3 have the following format: > > - A header appears at the beginning and consists of the following: > > @@ -36,6 +36,132 @@ Git pack format > > - The trailer records 20-byte SHA-1 checksum of all of the above. > > +== pack-*.pack files version 4 have the following format: > + > + - A header appears at the beginning and consists of the following: > + > + 4-byte signature: > + The signature is: {'P', 'A', 'C', 'K'} > + > + 4-byte version number (network byte order): must be 4 > + > + 4-byte number of objects contained in the pack (network byte order) > + > + - A series of tables, described separately. > + > + - The tables are followed by number of object entries, each of > + which looks like below: > + > + (undeltified representation) > + n-byte type and length (4-bit type, (n-1)*7+4-bit length) > + data > + > + (deltified representation) > + n-byte type and length (4-bit type, (n-1)*7+4-bit length) > + base object name in SHA-1 reference encoding > + compressed delta data > + > + "type" is used to determine object type. Commit has type 1, tree > + 2, blob 3, tag 4, ref-delta 7, canonical-commit 9 (commit type > + with bit 3 set), canonical-tree 10 (tree type with bit 3 set). > + Compared to v2, ofs-delta type is not used, and canonical-commit > + and canonical-tree are new types. > + > + In undeltified format, blobs and tags ares compressed. Trees are > + not compressed at all. Some headers in commits are stored > + uncompressed, the rest is compressed. Tree and commit > + representations are described in detail separately. > + > + Blobs and tags are deltified and compressed the same way in > + v3. Commits are not delitifed. Trees are deltified using > + undeltified representation. > + > + Trees and commits in canonical types are in the same format as > + v2: in canonical format and deflated. They can be used for > + completing thin packs or preserving somewhat ill-formatted > + objects. > + > + - The trailer records 20-byte SHA-1 checksum of all of the above. > + > +=== Pack v4 tables > + > + - A table of sorted SHA-1 object names for all objects contained in > + the on-disk pack. > + > + Thin packs are used for transferring on the wire and may omit base > + objects, expecting the receiver to add them before writing to > + disk. The SHA-1 table in thin packs must include the omitted objects > + as well. > + > + This table can be referred to using "SHA-1 reference encoding": the > + index, in variable length encoding, to this table. > + > + - Ident table: the uncompressed length in variable encoding, > + followed by zlib-compressed dictionary. Each entry consists of > + two prefix bytes storing timezone followed by a NUL-terminated > + string. > + > + Entries should be sorted by frequency so that the most frequent > + entry has the smallest index, thus most efficient variable > + encoding. > + > + The table can be referred to using "ident reference encoding": the > + index number, in variable length encoding, to this table. > + > + - Tree path table: the same format to ident table. Each entry > + consists of two prefix bytes storing tree entry mode, then a > + NUL-terminated path name. Same sort order recommendation applies. > + > +=== Commit representation > + > + - n-byte type and length (4-bit type, (n-1)*7+4-bit length) > + > + - Tree SHA-1 in SHA-1 reference encoding > + > + - Parent count in variable length encoding > + > + - Parent SHA-1s in SHA-1 reference encoding > + > + - Author reference in ident reference encoding > + > + - Author timestamp in variable length encoding > + > + - Committer reference in ident reference encoding > + > + - Committer timestamp, encoded as a difference against author > + timestamp with the LSB used to indicate negative difference. > + > + - Compressed data of remaining header and the body > + > +=== Tree representation > + > + - n-byte type and length (4-bit type, (n-1)*7+4-bit length) > + > + - Number of tree entries in variable length encoding > + > + - A number of entries, each can be in either forms > + > + - INT(path_index << 1) INT(sha1_index) > + > + - INT((entry_start << 1) | 1) INT(entry_count << 1) > + > + - INT((entry_start << 1) | 1) INT((entry_count << 1) | 1) INT(base_sha1_index) > + > + INT() denotes a number in variable length encoding. path_index is > + the index to the tree path table. sha1_index is the index to the > + SHA-1 table. entry_start is the first tree entry to copy > + from. entry_count is the number of tree entries to > + copy. base_sha1_index is the index to SHA-1 table of the base tree > + to copy from. > + > + The LSB of the first number indicates whether it's a plain tree > + entry (LSB not set), or an instruction to copy tree entries from > + another tree (LSB set). > + > + For copying from another tree, is the LSB of the second number is > + set, it will be followed by a base tree SHA-1. If it's not set, > + the last base tree will be used. > + > == Original (version 1) pack-*.idx files have the following format: > > - The header consists of 256 4-byte network byte order > @@ -160,3 +286,8 @@ Pack file entry: <+ > corresponding packfile. > > 20-byte SHA-1-checksum of all of the above. > + > +== Version 3 pack-*.idx files support only *.pack files version 4. The > + format is the same as version 2 except that the table of sorted > + 20-byte SHA-1 object names is missing in the .idx files. The same > + table exists in .pack files and will be used instead. > -- > 1.8.2.83.gc99314b > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html >