Re: GSoC - Designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 26, 2012 at 9:28 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote:
> Doesn't that venture into database land?

How about this (a bit like memory management). Maybe it's simpler than
a database and fits us better.

The header consists of crc32 and three uint32_t, one points to the
root tree, one the first extension block, the last one the free list
at the end of the file. The rest of the file contains sizable blocks.
There can be free space between them. Free spaces (offset and size)
are recorded at the end of the file, pointed in header. The header's
crc32 covers the header and free list.

When we need a new block, we look up in free list. If we cannot find a
suitable space, we append to the end of the file (moving free list
further to keep it always the end of the file). Removing a block means
marking it in free list. We only truncate if there is free space at
the end. Operations that we know will scratch the whole index are our
opportunity to rewrite the index and make it compact again. No random
garbage collection (iow disk is cheap).

A block starts with a signature (a tree block, or an extension...). A
tree block consists of:

 - uint32_t tree object's size
 - sha-1 of tree object
 - crc32 of the rest of the block except tree object
 - maybe reference counter of a block can be refered by many blocks??
 - tree object (i.e. something that tree-walk.c can parse)
 - other index attributes, stored separately in the same order as in
tree object above, uint32_t block offset of subdirectories.

An extension block basically consists of what we have now in an
extension plus uint32_t offset to the next extension block, so we can
keep track of all extensions. crc32 is used for extension blocks.

This way we only need to verify checksum of the header (and free list)
and blocks we visit. We don't need cache-tree extension because it's
part of the format. There will be headache with unpack-trees.c because
of entry order change. But in the end we would use the same order tree
objects are using now, much simpler for us.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]