Re: [PATCH] provide advance warning of some future pack default changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>+ * From v1.5.5, the pack.indexversion config option will default to 2,
>+   which is slightly more efficient, and makes repacking more immune to
>+   data corruptions.  Git older than version 1.5.2 may revert to version 1
>+   of the pack index with a manual "git index-pack" to be able to directly
>+   access corresponding pack files.

You might want to mention that it's slightly more TIME efficient,
but takes 16% more space (28 bytes per object rather than 24).

If it helps, I documented the v2 index file format (a lot stolen
from commit c553ca25bd60dc9fd50b8bc7bd329601b81cee66 message).
(Public domain, copyright abandoned, if it breaks you get to keep both
pieces, yadda yadda.)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index e5b31c8..a80baa4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -1,9 +1,9 @@
 GIT pack format
 ===============
 
-= pack-*.pack file has the following format:
+= pack-*.pack files have the following format:
 
-   - The header appears at the beginning and consists of the following:
+   - A header appears at the beginning and consists of the following:
 
      4-byte signature:
          The signature is: {'P', 'A', 'C', 'K'}
@@ -34,18 +34,14 @@ GIT pack format
 
   - The trailer records 20-byte SHA1 checksum of all of the above.
 
-= pack-*.idx file has the following format:
+= Original (version 1) pack-*.idx files have the following format:
 
   - The header consists of 256 4-byte network byte order
     integers.  N-th entry of this table records the number of
     objects in the corresponding pack, the first byte of whose
-    object name are smaller than N.  This is called the
+    object name is less than or equal to N.  This is called the
     'first-level fan-out' table.
 
-    Observation: we would need to extend this to an array of
-    8-byte integers to go beyond 4G objects per pack, but it is
-    not strictly necessary.
-
   - The header is followed by sorted 24-byte entries, one entry
     per object in the pack.  Each entry is:
 
@@ -55,10 +51,6 @@ GIT pack format
 
     20-byte object name.
 
-    Observation: we would definitely need to extend this to
-    8-byte integer plus 20-byte object name to handle a packfile
-    that is larger than 4GB.
-
   - The file is concluded with a trailer:
 
     A copy of the 20-byte SHA1 checksum at the end of
@@ -68,31 +60,30 @@ GIT pack format
 
 Pack Idx file:
 
-	idx
-	    +--------------------------------+
-	    | fanout[0] = 2                  |-.
-	    +--------------------------------+ |
+	--  +--------------------------------+
+fanout	    | fanout[0] = 2 (for example)    |-.
+table	    +--------------------------------+ |
 	    | fanout[1]                      | |
 	    +--------------------------------+ |
 	    | fanout[2]                      | |
 	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
-	    | fanout[255]                    | |
-	    +--------------------------------+ |
-main	    | offset                         | |
-index	    | object name 00XXXXXXXXXXXXXXXX | |
-table	    +--------------------------------+ |
-	    | offset                         | |
-	    | object name 00XXXXXXXXXXXXXXXX | |
-	    +--------------------------------+ |
-	  .-| offset                         |<+
-	  | | object name 01XXXXXXXXXXXXXXXX |
-	  | +--------------------------------+
-	  | | offset                         |
-	  | | object name 01XXXXXXXXXXXXXXXX |
-	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-	  | | offset                         |
-	  | | object name FFXXXXXXXXXXXXXXXX |
-	  | +--------------------------------+
+	    | fanout[255] = total objects    |---.
+	--  +--------------------------------+ | |
+main	    | offset                         | | |
+index	    | object name 00XXXXXXXXXXXXXXXX | | |
+table	    +--------------------------------+ | |
+	    | offset                         | | |
+	    | object name 00XXXXXXXXXXXXXXXX | | |
+	    +--------------------------------+<+ |
+	  .-| offset                         |   |
+	  | | object name 01XXXXXXXXXXXXXXXX |   |
+	  | +--------------------------------+   |
+	  | | offset                         |   |
+	  | | object name 01XXXXXXXXXXXXXXXX |   |
+	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   |
+	  | | offset                         |   |
+	  | | object name FFXXXXXXXXXXXXXXXX |   |
+	--| +--------------------------------+<--+
 trailer	  | | packfile checksum              |
 	  | +--------------------------------+
 	  | | idxfile checksum               |
@@ -116,3 +107,40 @@ Pack file entry: <+
 	  20-byte base object name SHA1 (the size above is the
 		size of the delta data that follows).
           delta data, deflated.
+
+
+= Version 2 pack-*.idx files support packs larger than 4 GiB, and
+  have some other reorganizations.  They have the format:
+
+  - A 4-byte magic number '\377tOc' which is an unreasonable
+    fanout[0] value.
+
+  - A 4-byte version number (= 2)
+
+  - A 256-entry fan-out table just like v1.
+
+  - A table of sorted 20-byte SHA1 object names.  These are
+    packed together without offset values to reduce the cache
+    footprint of the binary search for a specific object name.
+
+  - A table of 4-byte CRC32 values of the packed object data.
+    This is new in v2 so compressed data can be copied directly
+    from pack to pack during repacking withough undetected
+    data corruption.
+
+  - A table of 4-byte offset values (in network byte order).
+    These are usually 31-bit pack file offsets, but large
+    offsets are encoded as an index into the next table with
+    the msbit set.
+
+  - A table of 8-byte offset entries (empty for pack files less
+    than 2 GiB).  Pack files are organized with heavily used
+    objects toward the front, so most object references should
+    not need to refer to this table.
+
+  - The same trailer as a v1 pack file:
+
+    A copy of the 20-byte SHA1 checksum at the end of
+    corresponding packfile.
+
+    20-byte SHA1-checksum of all of the above.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux