NOTE: Please ignore the >- checkpoint : interleave with items in the block, 1 for every 100 items > 4-byte offset to next checkpoint That's irrelevant. Cheers, Elton On Fri, Apr 6, 2012 at 1:13 PM, elton sky <eltonsky9404@xxxxxxxxx> wrote: > Thank you everyone for ideas, clues, explanations and questions. > Collectively I wrote my proposal. This one is mostly based on Duy's > suggestion with minor changes. > > > Problem: > Current index store all files in a list. This implies that: > 1. Each operation has to read the whole index and write the whole index back. > 2. computing the checksum for the whole index. > These become expensive when the repo is large. > > Requirement: > Suppose n is the number objects in a repo > -Keep the read/write time <= O(logn). > -Keep the checksum computed against only necessary objects. > -New format is easy to parse. > -Backward compatible. > -Potentially use faster hash method. > > Proposed solution: > > Store the repo structure in a canonical tree. Each directory is a tree > block. A tree block contains blobs and offsets to sub directories. It > has its own checksum. A read/write will be done on tree block base. A > tree block also contains an offset points to its previous version (if > there's one). > > The root of offset is stored in trailer, which stays at the end of > file. Each update creates a new trailer which points to the new tree. > (details below) > > To save the pain of modify a tree block and track the free spaces in > the index, changed blocks are appended at the end. Based on the > assumption that user won't change too many files each time, for an > updated file, all its parent blocks to Root was copied and appended to > index. In other words, all traversed blocks are copied. The offset of > previous version of the updated block is stored in the new block. A > new generation number is stored in copied and updated blocks. Other > blocks are not copied. They are referenced by offsets. After update a > new trailer is created at the end. In this way, there's no harm to > read, and makes write fast. > > Trailer stores the offset of previous trailer. It makes tracking old > versions easy. > > Each operation will load and rewrite the header and visited trailer . > > Checksum for non identifier purpose will use crc32. Otherwise it uses sha1. > > For compatibility, old format of index will be transformed to new > format in the first operation. > > == > Index format: > > - A 8-byte header consisting of > > 4-byte signature: > The signature is { 'T', 'R', 'E', 'E' } > > 4-byte version number: > The current supported versions are 4. > > - A number of blocks of variable size > > 1-byte block type > > 3-byte content size in byte > > block content > > 4-byte crc32 of all above > > - A 20-byte trailer consisting of > > 4-byte trailer signature: > The signature is { 'R', 'O', 'O', 'T' } > > 4-byte root block offset > > 4-byte extension table offset: > Zero means no extension > > 4-byte offset to previous trailer > > 4-byte checksum: > CRC32 of the header and the trailer (excluding this field) > > == > Tree block: > > Tree block content is basically the list of entries in a specified > path, with all attributes we store in the index now. This entry list > is sorted by pathname. For doing a bsearch in the list, pathnames are > stored at the end of block, which makes the size of entry fixed. The > pathname is pointed from each entry with 2 byte offset (relative to a > block). This should not be problem as a block is never modified. > > It stores the generation number and the offset to old block. It also > integrates the content of cache-tree. > > A tree block content consists of > > - 4-byte tree object size > > - 20-byte SHA-1 of the cached tree object > > - checkpoint : interleave with items in the block, 1 for every 100 items > 4-byte offset to next checkpoint > > - a list attributes corresponding to tree object's item, in the same > order. These attributes are the same as in DIRC entry format > except that entry name is removed, and a tree block offset is > added in case the item is a directory. > > 32-bit ctime seconds, the last time a file's metadata changed > this is stat(2) data > > 32-bit ctime nanosecond fractions > this is stat(2) data > > 32-bit mtime seconds, the last time a file's data changed > this is stat(2) data > > 32-bit mtime nanosecond fractions > this is stat(2) data > > 32-bit dev > this is stat(2) data > > 32-bit ino > this is stat(2) data > > 32-bit mode, split into (high to low bits) > > 4-bit object type > valid values in binary are 1000 (regular file), 1010 (symbolic link) > and 1110 (gitlink) > > 3-bit unused > > 9-bit unix permission. Only 0755 and 0644 are valid for regular files. > Symbolic links and gitlinks have value 0 in this field. > > 32-bit uid > this is stat(2) data > > 32-bit gid > this is stat(2) data > > 32-bit file size > This is the on-disk size from stat(2), truncated to 32-bit. > > 160-bit SHA-1 for the represented object if blobs or the offset > to another tree block if trees > > A 32-bit 'flags' field split into (high to low bits) > > 1-bit assume-valid flag > > 1-bit extended flag (must be zero in version 2) > > 2-bit stage (during merge) > > 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF > is stored in this field. > > 1-bit skip-worktree flag (used by sparse checkout) > > 1-bit intent-to-add flag (used by "git add -N") > > 14-bit unused, must be zero > > A 16-bit offset, relative to the beginning of this block, to the > pathname of this entry > > - 2-byte generation number, starts from 1 > > - 4-byte previous version offset > > - a list of NUL-terminated pathnames, pointed to from the 16-bit offset > above > > == Extension table block > > Extension table has block type 'X'. It consists of a series of 4-byte > extension block offset. > > == Extension block > > Extension block has block type 'E'. Extension content is the same as > in the old format. > > > > Time line: > > 24/04 ~ 21/05: get familiar with code base and revise proposal > benchmark with linux kernel on major operations > write prototype to prove feasibility of proposed solution > consult mailing list & irc > > 22/05 ~ 25/06 write code, test and benchmark > modify tree lib > modify index format operations > modify git operations > transform from old to new > > 26/06 ~ 30/07 revise things according to benchmark > > 31/07 ~ 13/08 update documentation > > > About me > > My name is Elton Tian, I am from China. I have been living in > Australia for quite a few years. I am currently a Master student from > Australia National University. After graduate I worked on linux based > web development (using tcl) for 2.5 years. I have been programming > with c, c#, java, tcl, php, javascript and shell script. But I prefer > c, which gives me the feeling of full control over the program. I am > interested in data intensive computing. I played with hadoop and > mapreduce since 2010. I maintained my 3 node cluster behind my desk. > As linus hates cvs with a passion, I still want to mention I was using > cvs in work, don't like it though. And I guess this is the good chance > to get over it. > > On Thu, Apr 5, 2012 at 2:22 AM, elton sky <eltonsky9404@xxxxxxxxx> wrote: >> Hello, >> >> Some updates for Nguyen's index: >> >> On Wed, Apr 4, 2012 at 10:20 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: >>> On Wed, Apr 4, 2012 at 3:26 PM, elton sky <eltonsky9404@xxxxxxxxx> wrote: >>>> I am not sure how the trailer works. >>>> I assume there can be multiple trailers, each update will generate a >>>> new one. Every trailer will point to the root tree (i.e. all trailers >>>> point to the same block?). So if there are some changes to root, like >>>> rename, trailers all point to the latest root block? >>> >>> Each trailer points to the whole new tree. Because trees are >>> immutable, changing in a tree meangs creating a new one and will also >>> make a new parent tree (to point to the updated tree because old >>> parent will always point to old tree). This eventually leads to root >>> tree change, recorded by the trailer. >>> >>>> Is the index looks like : >>>> | HEADER | TREE BLOCKS | TRAILER | TREE BLOCKS | TRAILER | TREE >>>> BLOCKS | TRAILER | ... >>>> >> >> Once an update happened to a block, all parent blocks to root will be >> copied to the end of the index. And the updated block will be at leaf >> of the new tree. Finding this path costs logn time anyway. This is no >> harm for read for the whole tree. But in order to find all previous >> changes to a block, we have to go through all trailers and trees. >> >> Otherwise, just modify the original tree. Let the parent points to the >> updated block and let the updated block points to the old block: >> >> parent >> | >> V >> updated old old >> block (v3) --> block(v2) --> block (v1) >> | >> V >> child >> blocks >> >> A version number in a tree block is used to track the changes. >> In this way, there's still no harm to read, and it's more easy to >> trace the change history of a block. Also, we don't need to create >> interleaved blocks and trailers. There's only one trailer in the end >> of file. We also need to add another offset points to previous >> version. >> >> Trailer is kept at the end of index, as its size is variable. It >> contains offset to root and list of free spaces. >> >> Changes to format: >> >>> = The git index file has the following format >> >> As is, except there's only one trailer now. And trailer contains list >> of free spaces. >> >>>- A 18-byte trailer consisting of >>> >>> 4-byte trailer signature: >>> The signature is { 'R', 'O', 'O', 'T' } >>> >>> 2-byte generation: >>> The first trailer is 0, the second 1 and so on. >>> >>> 4-byte root block offset >>> >>> 4-byte extension table offset: >>> Zero means no extension >>> >> list of free spaces >> - 4 byte offset >> - 2 byte length >> >>> 4-byte checksum: >>> CRC32 of the header and the trailer (excluding this field) >> >> Free space list is read/written in whole for each operation, together >> with trailer. >> >>> >>> == Tree block >>> ... >> >> Above as is. >> >> - 1 byte version num >> >> - 4 byte offset to previous version block >> >>> >>> 160-bit SHA-1 for the represented object if blobs or the offset >>> to another tree block if trees >>> >>> A 32-bit 'flags' field split into (high to low bits) >>> >>> 1-bit assume-valid flag >>> >>> 1-bit extended flag (must be zero in version 2) >>> >>> 2-bit stage (during merge) >>> >>> 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF >>> is stored in this field. >>> >>> 1-bit skip-worktree flag (used by sparse checkout) >>> >>> 1-bit intent-to-add flag (used by "git add -N") >>> >>> 14-bit unused, must be zero >>> >>> A 16-bit offset, relative to the beginning of this block, to the >>> pathname of this entry. FIXME: make it 32-bit, relative to the >>> beginning of the file, so that we can reuse pathnames from other >>> (old) blocks? >>> >> >> -Elton -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html