Re: GSoC - Designing a faster index format

elton sky <eltonsky9404@xxxxxxxxx> · Wed, 4 Apr 2012 18:26:21 +1000

Hi Nguyen,

A few questions,

> -- 8< --
> GIT index format
> ================
>
> This format replaces the old "DIRC" format. Compared to the old
> format, which is essentially a sorted list of pathnames, this one:
>
>  - is tree-based
>  - use crc32 as checksum
>  - only verify integrity on parts that git accesses, instead of whole
>   file
>  - append changes to the end
>  - allow index versioning
>
> Updates can be made directly to the index by appending to the end. The
> index traversed by locating the root tree block from the trailer. When
> a path is updated, all related tree blocks are updated and appended to
> the end, then a new trailer (with generation increased by one) is
> written to conclude the index.
>
> The index size will increase continuously. At some point, we will need
> to repack it. Let assume a tree block is 64k on average and a path
> generally consists of 3 path components.  That means an entry update
> adds 192k and we can do about 80 updates before index reaches 16M (in
> addition to initial index size).
>
> At 16M or when trailer generation hits a limit (the limit can be
> configurable), we rewrite the index to reduce its size. Some heavy
> operations can also be used to rewrite index, such as checkout or
> reset.
>
> The index integrity is verified by crc32. One crc32 covers header and
> trailer. Each block has its own crc32. When the index is found
> corrupt, we could try to roll back to latest good version by looking
> for trailers from bottom up. Even when the index is not corrupt, users
> can still look back this way for older index versions.
>

I am not sure how the trailer works.
I assume there can be multiple trailers, each update will generate a
new one. Every trailer will point to the root tree (i.e. all trailers
point to the same block?). So if there are some changes to root, like
rename, trailers all point to the latest root block?

Is the index looks like :
| HEADER | TREE BLOCKS | TRAILER |  TREE BLOCKS | TRAILER | TREE
BLOCKS | TRAILER | ...

Blocks and trailers are interleaved. The index starts from a few
blocks (git add file1 file2 file3 ..) and expands as it goes. If file1
is updated, the tree block containing file1 is updated and appended.
(At this point, 2 versions of tree blocks containing file is in index
?) How do you organize these 2 block in a tree ?

Appended blocks are also a tree or just a list. If it's a list, it
needs O(n) read time. If it's like a sub tree, I assume it's small,
because I guess there won't be many changes each time. If it's too
small then lgn -> n, and in total read time -> n.

> = The git index file has the following format
>
>   - A 8-byte header consisting of
>
>     4-byte signature:
>       The signature is { 'T', 'R', 'E', 'E' }
>
>     4-byte version number:
>       The current supported versions are 1.
>
>   - A number of blocks of variable size
>
>      1-byte block type
>
>      3-byte content size in byte
>
>      block content
>
>      4-byte crc32 of all above
>
>   - A 18-byte trailer consisting of
>
>      4-byte trailer signature:
>        The signature is { 'R', 'O', 'O', 'T' }
>
>      2-byte generation:
>         The first trailer is 0, the second 1 and so on.
>
>      4-byte root block offset
>
>      4-byte extension table offset:
>        Zero means no extension
>
>      4-byte checksum:
>        CRC32 of the header and the trailer (excluding this field)
>
> == Tree block
>
>  A tree block contains a (maybe invalid) tree object and extra
>  information of its companion in working directory. Tree block has
>  block type 'T'.
>
>  Tree block content is basically the list of non-recursive entries in
>  specified path, with all attributes we store in the index now. There
>  are a few changes though to intergrate cache-tree and allow
>  bsearch() on mmap'd block.
>
>  A tree block content consists of
>
>  - 4-byte tree object size
>
>  - 20-byte SHA-1 of the cached tree object
>
>  - a list attributes corresponding to tree object's item, in the same
>    order.  These attributes are the same as in DIRC entry format
>    except that entry name is removed, and a tree block offset is
>    added in case the item is a directory.
>
>    32-bit ctime seconds, the last time a file's metadata changed
>      this is stat(2) data
>
>    32-bit ctime nanosecond fractions
>      this is stat(2) data
>
>    32-bit mtime seconds, the last time a file's data changed
>      this is stat(2) data
>
>    32-bit mtime nanosecond fractions
>      this is stat(2) data
>
>    32-bit dev
>      this is stat(2) data
>
>    32-bit ino
>      this is stat(2) data
>
>    32-bit mode, split into (high to low bits)
>
>      4-bit object type
>        valid values in binary are 1000 (regular file), 1010 (symbolic link)
>        and 1110 (gitlink)
>
>      3-bit unused
>
>      9-bit unix permission. Only 0755 and 0644 are valid for regular files.
>      Symbolic links and gitlinks have value 0 in this field.
>
>    32-bit uid
>      this is stat(2) data
>
>    32-bit gid
>      this is stat(2) data
>
>    32-bit file size
>      This is the on-disk size from stat(2), truncated to 32-bit.
>
>    160-bit SHA-1 for the represented object if blobs or the offset
>      to another tree block if trees
>
>    A 32-bit 'flags' field split into (high to low bits)
>
>      1-bit assume-valid flag
>
>      1-bit extended flag (must be zero in version 2)
>
>      2-bit stage (during merge)
>
>      12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
>      is stored in this field.
>
>      1-bit skip-worktree flag (used by sparse checkout)
>
>      1-bit intent-to-add flag (used by "git add -N")
>
>      14-bit unused, must be zero
>
>    A 16-bit offset, relative to the beginning of this block, to the
>      pathname of this entry. FIXME: make it 32-bit, relative to the
>      beginning of the file, so that we can reuse pathnames from other
>      (old) blocks?
>

It's nice to enable it for bsearch in a block by separate pathname.
If all names are shared by all blocks, this pathname tree will be
loaded for every operation. I guess the load&hash is expensive.

>  - a list of NUL-terminated pathnames, pointed to from the 16-bit offset
>    above. This list does not have to be of the same order as the attribute
>    list. The reason this is separated from the attribute list is to make
>    attribute list fixed size, searchable using bsearch().
>
> == Extension table block
>
>  Extension table has block type 'X'. It consists of a series of 4-byte
>  extension block offset.
>
> == Extension block
>
>  Extension block has block type 'E'. Extension content is the same as
>  in the old format.
> -- 8< --
> --
> Duy

-Elton
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html