Re: Git's database structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 05, 2007, at 13:31:43, Julian Phillips wrote:
And this is advantaged by having the path in the blob how? The important information here is knowing which commits touched the file - this information is expensive in git because it is snapshot based. You have to go back through all the commits looking for changes to the given path. The information you might want to cache is which commits touched the file, which you could do without changing the current data storage. Presumably you are suggesting that such a cache would be cleaner with the filename in the blob? Or do you think that it would somehow be faster to create? If so, how?

The only possible reason I can think of for moving data into the blob would be to make a POSIX-compliant git-like filesystem, and EVEN THEN you would NOT move the path out of the tree objects. In order to have somewhat consistent inodes (and also for performance when changing 4 bytes in a 40GB file) you would want to have 3 different types of "inode" objects:

1)  4-64k of (metadata + filedata)
2)  4-64k of (metadata + list of 4-64k filedata blobs)
3)  4-64k of (metadata + list of 4-64k lists of filedata blobs)

On the other hand... that isn't GIT, it's something completely different with a very different usage pattern and set of requirements. And you still don't put the path name in the objects, just the permissions and other attributes/metadata.

<Random Thought Experiment>
You would of course want to better define those 4-64k limits for allocation and performance reasons, but a double-indirect table of SHA128s with 64kb chunks lets you address up to 1TB of file data, and for each additional power-of-two increase in the chunk size you get 8 times the storage space. Furthermore, the actual double-indirect tables for an 8TB file using 128k chunks would be all of 64MB, for a more reasonable 4GB file with 32k tables (max of 128GB) it would be maybe 128kB of indirect SHA1 hash tables.
</Random Thought Experiment>

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux