On Fri, 14 Dec 2007, linux@xxxxxxxxxxx wrote: > >+ * From v1.5.5, the pack.indexversion config option will default to 2, > >+ which is slightly more efficient, and makes repacking more immune to > >+ data corruptions. Git older than version 1.5.2 may revert to version 1 > >+ of the pack index with a manual "git index-pack" to be able to directly > >+ access corresponding pack files. > > You might want to mention that it's slightly more TIME efficient, > but takes 16% more space (28 bytes per object rather than 24). Well, sure, but not now. This is just an advance warning of what the next release after this one will do. > If it helps, I documented the v2 index file format (a lot stolen > from commit c553ca25bd60dc9fd50b8bc7bd329601b81cee66 message). > (Public domain, copyright abandoned, if it breaks you get to keep both > pieces, yadda yadda.) If anything: Acked-by: Nicolas Pitre <nico@xxxxxxx> > diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt > index e5b31c8..a80baa4 100644 > --- a/Documentation/technical/pack-format.txt > +++ b/Documentation/technical/pack-format.txt > @@ -1,9 +1,9 @@ > GIT pack format > =============== > > -= pack-*.pack file has the following format: > += pack-*.pack files have the following format: > > - - The header appears at the beginning and consists of the following: > + - A header appears at the beginning and consists of the following: > > 4-byte signature: > The signature is: {'P', 'A', 'C', 'K'} > @@ -34,18 +34,14 @@ GIT pack format > > - The trailer records 20-byte SHA1 checksum of all of the above. > > -= pack-*.idx file has the following format: > += Original (version 1) pack-*.idx files have the following format: > > - The header consists of 256 4-byte network byte order > integers. N-th entry of this table records the number of > objects in the corresponding pack, the first byte of whose > - object name are smaller than N. This is called the > + object name is less than or equal to N. This is called the > 'first-level fan-out' table. > > - Observation: we would need to extend this to an array of > - 8-byte integers to go beyond 4G objects per pack, but it is > - not strictly necessary. > - > - The header is followed by sorted 24-byte entries, one entry > per object in the pack. Each entry is: > > @@ -55,10 +51,6 @@ GIT pack format > > 20-byte object name. > > - Observation: we would definitely need to extend this to > - 8-byte integer plus 20-byte object name to handle a packfile > - that is larger than 4GB. > - > - The file is concluded with a trailer: > > A copy of the 20-byte SHA1 checksum at the end of > @@ -68,31 +60,30 @@ GIT pack format > > Pack Idx file: > > - idx > - +--------------------------------+ > - | fanout[0] = 2 |-. > - +--------------------------------+ | > + -- +--------------------------------+ > +fanout | fanout[0] = 2 (for example) |-. > +table +--------------------------------+ | > | fanout[1] | | > +--------------------------------+ | > | fanout[2] | | > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | > - | fanout[255] | | > - +--------------------------------+ | > -main | offset | | > -index | object name 00XXXXXXXXXXXXXXXX | | > -table +--------------------------------+ | > - | offset | | > - | object name 00XXXXXXXXXXXXXXXX | | > - +--------------------------------+ | > - .-| offset |<+ > - | | object name 01XXXXXXXXXXXXXXXX | > - | +--------------------------------+ > - | | offset | > - | | object name 01XXXXXXXXXXXXXXXX | > - | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > - | | offset | > - | | object name FFXXXXXXXXXXXXXXXX | > - | +--------------------------------+ > + | fanout[255] = total objects |---. > + -- +--------------------------------+ | | > +main | offset | | | > +index | object name 00XXXXXXXXXXXXXXXX | | | > +table +--------------------------------+ | | > + | offset | | | > + | object name 00XXXXXXXXXXXXXXXX | | | > + +--------------------------------+<+ | > + .-| offset | | > + | | object name 01XXXXXXXXXXXXXXXX | | > + | +--------------------------------+ | > + | | offset | | > + | | object name 01XXXXXXXXXXXXXXXX | | > + | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | > + | | offset | | > + | | object name FFXXXXXXXXXXXXXXXX | | > + --| +--------------------------------+<--+ > trailer | | packfile checksum | > | +--------------------------------+ > | | idxfile checksum | > @@ -116,3 +107,40 @@ Pack file entry: <+ > 20-byte base object name SHA1 (the size above is the > size of the delta data that follows). > delta data, deflated. > + > + > += Version 2 pack-*.idx files support packs larger than 4 GiB, and > + have some other reorganizations. They have the format: > + > + - A 4-byte magic number '\377tOc' which is an unreasonable > + fanout[0] value. > + > + - A 4-byte version number (= 2) > + > + - A 256-entry fan-out table just like v1. > + > + - A table of sorted 20-byte SHA1 object names. These are > + packed together without offset values to reduce the cache > + footprint of the binary search for a specific object name. > + > + - A table of 4-byte CRC32 values of the packed object data. > + This is new in v2 so compressed data can be copied directly > + from pack to pack during repacking withough undetected > + data corruption. > + > + - A table of 4-byte offset values (in network byte order). > + These are usually 31-bit pack file offsets, but large > + offsets are encoded as an index into the next table with > + the msbit set. > + > + - A table of 8-byte offset entries (empty for pack files less > + than 2 GiB). Pack files are organized with heavily used > + objects toward the front, so most object references should > + not need to refer to this table. > + > + - The same trailer as a v1 pack file: > + > + A copy of the 20-byte SHA1 checksum at the end of > + corresponding packfile. > + > + 20-byte SHA1-checksum of all of the above. > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html