Re: GSoC - Designing a faster index format

elton sky <eltonsky9404@xxxxxxxxx> · Mon, 26 Mar 2012 23:41:32 +1100

As the previous email is hidden in the trimmed area, just resend it:

About the new format:

The index is a single file. Entries in the index still stored
sequentially as old format. The difference is they are grouped into
blocks. A block contains many entries and they are ordered by names.
Blocks are also ordered by the name of the first entry. Each block
contains a sha1 for entries in it.
For using a binary search to locate the block for an entry, the
offsets of blocks are stored in the header of the index. We reserve
100 spaces for block offsets in the header. More offsets are stored in
a meta block (see below) afterwards. An offset of the first meta block
is stored.
The checksum is computed on block. After we locate the block, the
checksum is recomputed for the block. And only the this block will be
read and write back later. As the block is read into ram, it is easy
to do a binary search for entries in a block when they are in ram.
When the index doesn't have many entries, it works very similar with
current format. When more entries git-added, blocks will come into
play.

Format:

Head:
- 4-byte signature
- 4-byte version num
- 4-byte num of entries blocks
- 4-byte offset for new block
- list of offsets for blocks (e.g. 96, 14096, 8192, ..) : For binary
search. Each offset is 8 bytes, we reserve 100 x 4 = 400 bytes for
first 100 blocks. More offsets (if applicable) will be stored in a
meta blocks.
- 4-byte offset to the first meta block
- 20-byte sha1 for above and meta blocks

List of Blocks:
- sha1 for all entries
- list of entries

Meta block:
- offset to next meta block
- list of offsets

Extensions:
      TBD. Have not hacked cache tree yet. Need more knowledge of cache tree...

Block Split & Delete:
      TBD.

Regards,
Elton
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html