[PATCH] Document pack v4 format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
---
 For my education but may help people who are interested in the
 format. Most is gathered from commit messages, except the delta tree
 entries.
 
 .idx is not documented yet, but it does not change much and not the
 focus right now anyway.

 Documentation/technical/pack-format-v4.txt (new) | 110 +++++++++++++++++++++++
 1 file changed, 110 insertions(+)
 create mode 100644 Documentation/technical/pack-format-v4.txt

diff --git a/Documentation/technical/pack-format-v4.txt b/Documentation/technical/pack-format-v4.txt
new file mode 100644
index 0000000..9123a53
--- /dev/null
+++ b/Documentation/technical/pack-format-v4.txt
@@ -0,0 +1,110 @@
+Git pack v4 format
+==================
+
+== pack-*.pack files have the following format:
+
+   - A header appears at the beginning and consists of the following:
+
+     4-byte signature:
+	  The signature is: {'P', 'A', 'C', 'K'}
+
+     4-byte version number (network byte order): must be version
+     number 4
+
+     4-byte number of objects contained in the pack (network byte
+     order)
+
+   - (20 * nr_objects)-byte SHA-1 table: sorted in memcmp() order.
+
+   - Commit name dictionary: the uncompressed length in variable
+     encoding, followed by zlib-compressed dictionary. Each entry
+     consists of two prefix bytes storing timezone followed by a
+     NUL-terminated string.
+
+     Entries should be sorted by frequency so that the most frequent
+     entry has the smallest index, thus most efficient variable
+     encoding.
+
+   - Tree path dictionary: similar format to commit name
+     dictionary. Each entry consists of two prefix bytes storing entry
+     mode, then a NUL-terminated path name. Same sort order
+     recommendation applies.
+
+   - The header is followed by number of object entries, each of
+     which looks like this:
+
+     (undeltified representation)
+     n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+     [uncompressed data]
+     [compressed data]
+
+     (deltified representation)
+     n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+     base object name in SHA-1 reference encoding
+     compressed delta data
+
+     In undeltified format, blobs and tags do not have the
+     uncompressed data, all object content is compressed. Trees are
+     not compressed at all. Some headers in commits are stored
+     uncompressed, the rest is compressed.
+
+     All objects except trees are deltified and compressed the same
+     way in v3. Trees however are deltified differently and use
+     undeltified representation. See "Tree representation" below for
+     details.
+
+  - The trailer records 20-byte SHA-1 checksum of all of the above.
+
+=== Commit representation
+
+  - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+
+  - Tree SHA-1 in SHA-1 reference encoding
+
+  - Parent count in variable length encoding
+
+  - Parent SHA-1s in SHA-1 reference encoding
+
+  - Author reference: the index, in variable length encoding, to comit
+    name dictionary, which covers the name and also the time zone.
+
+  - Author timestamp in variable length encoding
+
+  - Committer reference: the index, in variable length encoding, to
+    comit name dictionary, which covers the name and also the time
+    zone.
+
+  - Committer timestamp in variable length encoding
+
+  - Compressed data of remaining header and the body
+
+=== Tree representation
+
+  - n-byte type and length (4-bit type, (n-1)*7+4-bit length)
+
+  - Number of trees in variable length encoding
+
+  - A number of trees, each consists of
+
+    Path component reference: an index, in variable length encoding,
+    into tree path dictionary, which also covers entry mode.
+
+    SHA-1 in SHA-1 reference encoding.
+
+Path component reference zero is an indicator of deltified portion and
+has the following format:
+
+  - path component reference: zero
+
+  - index of the entry to copy from, in variable length encoding
+
+  - number of entries in variable length encoding
+
+  - base tree in SHA-1 reference encoding
+
+=== SHA-1 reference encoding
+
+This encoding is used to encode SHA-1 efficiently if it's already in
+the SHA-1 table. It starts with an index number in variable length
+encoding. If it's not zero, its value minus one is the index in the
+SHA-1 table. If it's zero, 20 bytes of SHA-1 is followed.
-- 
1.8.2.83.gc99314b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]