Re: GSoC - Designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Some updates for Nguyen's index:

On Wed, Apr 4, 2012 at 10:20 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> On Wed, Apr 4, 2012 at 3:26 PM, elton sky <eltonsky9404@xxxxxxxxx> wrote:
>> I am not sure how the trailer works.
>> I assume there can be multiple trailers, each update will generate a
>> new one. Every trailer will point to the root tree (i.e. all trailers
>> point to the same block?). So if there are some changes to root, like
>> rename, trailers all point to the latest root block?
>
> Each trailer points to the whole new tree. Because trees are
> immutable, changing in a tree meangs creating a new one and will also
> make a new parent tree (to point to the updated tree because old
> parent will always point to old tree). This eventually leads to root
> tree change, recorded by the trailer.
>
>> Is the index looks like :
>> | HEADER | TREE BLOCKS | TRAILER |  TREE BLOCKS | TRAILER | TREE
>> BLOCKS | TRAILER | ...
>>

Once an update happened to a block, all parent blocks to root will be
copied to the end of the index. And the updated block will be at leaf
of the new tree. Finding this path costs logn time anyway. This is no
harm for read for the whole tree. But in order to find all previous
changes to a block, we have to go through all trailers and trees.

Otherwise, just modify the original tree. Let the parent points to the
updated block and let the updated block points to the old block:

parent
   |
  V
updated            old                  old
block (v3)  -->   block(v2)  -->  block (v1)
   |
  V
child
blocks

A version number in a tree block is used to track the changes.
In this way, there's still no harm to read, and it's more easy to
trace the change history of a block. Also, we don't need to create
interleaved blocks and trailers. There's only one trailer in the end
of file. We also need to add another offset points to previous
version.

Trailer is kept at the end of index, as its size is variable. It
contains offset to root and list of free spaces.

Changes to format:

> = The git index file has the following format

As is, except there's only one trailer now. And trailer contains list
of free spaces.

>- A 18-byte trailer consisting of
>
>      4-byte trailer signature:
>        The signature is { 'R', 'O', 'O', 'T' }
>
>      2-byte generation:
>         The first trailer is 0, the second 1 and so on.
>
>      4-byte root block offset
>
>      4-byte extension table offset:
>        Zero means no extension
>
       list of free spaces
       - 4 byte offset
       - 2 byte length

>      4-byte checksum:
>        CRC32 of the header and the trailer (excluding this field)

Free space list is read/written in whole for each operation, together
with trailer.

>
> == Tree block
> ...

Above as is.

- 1 byte version num

- 4 byte offset to previous version block

>
>    160-bit SHA-1 for the represented object if blobs or the offset
>      to another tree block if trees
>
>    A 32-bit 'flags' field split into (high to low bits)
>
>      1-bit assume-valid flag
>
>      1-bit extended flag (must be zero in version 2)
>
>      2-bit stage (during merge)
>
>      12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
>      is stored in this field.
>
>      1-bit skip-worktree flag (used by sparse checkout)
>
>      1-bit intent-to-add flag (used by "git add -N")
>
>      14-bit unused, must be zero
>
>    A 16-bit offset, relative to the beginning of this block, to the
>      pathname of this entry. FIXME: make it 32-bit, relative to the
>      beginning of the file, so that we can reuse pathnames from other
>      (old) blocks?
>

-Elton
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]