Re: GSoC - Designing a faster index format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nguyen,

Thanks for the idea. just a few questions

On Tue, Mar 27, 2012 at 3:19 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> On Mon, Mar 26, 2012 at 9:28 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote:
>> Doesn't that venture into database land?
>
> How about this (a bit like memory management). Maybe it's simpler than
> a database and fits us better.
>
> The header consists of crc32 and three uint32_t, one points to the
> root tree, one the first extension block, the last one the free list
> at the end of the file. The rest of the file contains sizable blocks.
> There can be free space between them. Free spaces (offset and size)
> are recorded at the end of the file, pointed in header. The header's
> crc32 covers the header and free list.

How do you record free spaces at the end of file? Are you gonna have a
fixed size for the index and reserve space for free spaces offsets.

>
> When we need a new block, we look up in free list. If we cannot find a
> suitable space, we append to the end of the file (moving free list
> further to keep it always the end of the file). Removing a block means
> marking it in free list. We only truncate if there is free space at
> the end. Operations that we know will scratch the whole index are our
> opportunity to rewrite the index and make it compact again. No random
> garbage collection (iow disk is cheap).
>

I agree with you. Maybe we just ignore free spaces in the index and
let a background thread to compact it.

> A block starts with a signature (a tree block, or an extension...). A
> tree block consists of:
>
>  - uint32_t tree object's size
>  - sha-1 of tree object
>  - crc32 of the rest of the block except tree object
>  - maybe reference counter of a block can be refered by many blocks??
>  - tree object (i.e. something that tree-walk.c can parse)

Do you mean each block contains a tree and all its blobs? So the tree
object here, effectively a dir, also contains files in the dir ? In
this way, some blocks can be very big.

>  - other index attributes, stored separately in the same order as in
> tree object above, uint32_t block offset of subdirectories.

There can be many sub dirs some times. But maybe not a prob.

As tree object and  offset of subdirectories are variables, how do you
make a block resizable?

>
> An extension block basically consists of what we have now in an
> extension plus uint32_t offset to the next extension block, so we can
> keep track of all extensions. crc32 is used for extension blocks.
>
> This way we only need to verify checksum of the header (and free list)
> and blocks we visit. We don't need cache-tree extension because it's
> part of the format. There will be headache with unpack-trees.c because
> of entry order change. But in the end we would use the same order tree
> objects are using now, much simpler for us.
> --
> Duy

Cheers,
Elton
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]