Hi Nguyen, Thanks for the idea. just a few questions On Tue, Mar 27, 2012 at 3:19 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > On Mon, Mar 26, 2012 at 9:28 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote: >> Doesn't that venture into database land? > > How about this (a bit like memory management). Maybe it's simpler than > a database and fits us better. > > The header consists of crc32 and three uint32_t, one points to the > root tree, one the first extension block, the last one the free list > at the end of the file. The rest of the file contains sizable blocks. > There can be free space between them. Free spaces (offset and size) > are recorded at the end of the file, pointed in header. The header's > crc32 covers the header and free list. How do you record free spaces at the end of file? Are you gonna have a fixed size for the index and reserve space for free spaces offsets. > > When we need a new block, we look up in free list. If we cannot find a > suitable space, we append to the end of the file (moving free list > further to keep it always the end of the file). Removing a block means > marking it in free list. We only truncate if there is free space at > the end. Operations that we know will scratch the whole index are our > opportunity to rewrite the index and make it compact again. No random > garbage collection (iow disk is cheap). > I agree with you. Maybe we just ignore free spaces in the index and let a background thread to compact it. > A block starts with a signature (a tree block, or an extension...). A > tree block consists of: > > - uint32_t tree object's size > - sha-1 of tree object > - crc32 of the rest of the block except tree object > - maybe reference counter of a block can be refered by many blocks?? > - tree object (i.e. something that tree-walk.c can parse) Do you mean each block contains a tree and all its blobs? So the tree object here, effectively a dir, also contains files in the dir ? In this way, some blocks can be very big. > - other index attributes, stored separately in the same order as in > tree object above, uint32_t block offset of subdirectories. There can be many sub dirs some times. But maybe not a prob. As tree object and offset of subdirectories are variables, how do you make a block resizable? > > An extension block basically consists of what we have now in an > extension plus uint32_t offset to the next extension block, so we can > keep track of all extensions. crc32 is used for extension blocks. > > This way we only need to verify checksum of the header (and free list) > and blocks we visit. We don't need cache-tree extension because it's > part of the format. There will be headache with unpack-trees.c because > of entry order change. But in the end we would use the same order tree > objects are using now, much simpler for us. > -- > Duy Cheers, Elton -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html