Re: Understanding version 4 packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 24, 2007 at 07:24:17PM -0400, Nicolas Pitre wrote:
> On Sat, 24 Mar 2007, Peter Eriksen wrote:
> 
> > There is a new tree type called OBJ_DICT_TREE, which looks something
> > like the following:
> > 
> > +-----------------+------------------------------------------------+----
> > |  Table offset   |  SHA-1 of the blob corresponding to the path.  | ...
> > +-----------------+------------------------------------------------+----
> >       6 bytes                     20 bytes
> 
> Actually it is a 2-byte index in the path table, and a 4-byte index in a 
> common SHA1 table.  So each tree entry is 6 bytes total.

What happens to the paths, that do not have a correponding entry in the
path name table, because they are not among the 65535 most frequent
paths in the pack?

> > The index (.idx) files are extended to have a 4 byte pointer to the
> > offset of this file name table in the pack file for easy lookup.
> 
> Right.  And it will lose the SHA1 entries since they are already 
> available in the pack.

Does this mean, that the current index format will change from:

  - The header is followed by sorted 24-byte entries, one entry
    per object in the pack.  Each entry is:

    4-byte network byte order integer, recording where the
    object is stored in the packfile as the offset from the
    beginning.

to just 4-byte entries, and are the SHA-1 entries in that extra table
of SHA-1's referenced by OBJ_DICT_TREE objects in the pack file?

Regards,

Peter

P.S. I have updated my description of the pack format. Any comments are
welcome.

On disk format of version 4 packs (v0.1)
=================================

There is a file name table, EXT_OBJ_FILENAME_TABLE, which is placed
anywhere in the pack file, but before any OBJ_DICT_TREE objects, which
are referencing the table, so that the pack can be easily streamed. It
is using the format:

+-------------------------------+
|  Compressed file name table   |
+-------------------------------+

The uncompressed file name table contains NR_ENTRIES entries,
and looks like this:

+------------+------+--------------+------+--------------------+----
| NR_ENTRIES | MODE |  Full path 1 | MODE | Full path 2        | ...
+------------+------+--------------+------+--------------------+----
   4 bytes    2 bytes   n1 bytes    2 bytes     n2 bytes     

MODE is a network-byte-order integer representing the mode of the path,
and the path is a variable length, null-terminated string.

The table is sorted by path then mode for easy binary lookup, and so
that pointers into this table can be compared directly instead of
comparing the corresponding paths and modes. This table contains the
65535 most used paths in the entire pack.

There is a new tree type called OBJ_DICT_TREE, which looks like the
following:

+--------+----------------+----
| P offs |   SHA-1 offs   | ...
+--------+----------------+----
  2 bytes      4 bytes

That is, each entry contains a 2-byte index into the path table, and a
corresponding 4-byte index into a SHA-1 table.

These new tree objects will remain uncompressed in the pack file, but
sorted with, and deltaed against other tree objects. All normal tree
objects are converted to OBJ_DICT_TREE when packing, and are converted
back on the fly to callers who need an ordinary OBJ_TREE.

The index (.idx) files are extended to have a 4 byte pointer to the
offset of this file name table in the pack file for easy lookup.

There is something similar with a table, EXT_OBJ_IDENT_TABLE of common
strings in commit objects (e.g. author and timezone), and a new object
OBJ_DICT_COMMIT, but I have not understood that quite yet.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]