On Sep 05, 2007, at 13:31:43, Julian Phillips wrote:
And this is advantaged by having the path in the blob how? The
important information here is knowing which commits touched the
file - this information is expensive in git because it is snapshot
based. You have to go back through all the commits looking for
changes to the given path. The information you might want to cache
is which commits touched the file, which you could do without
changing the current data storage. Presumably you are suggesting
that such a cache would be cleaner with the filename in the blob?
Or do you think that it would somehow be faster to create? If so,
how?
The only possible reason I can think of for moving data into the blob
would be to make a POSIX-compliant git-like filesystem, and EVEN THEN
you would NOT move the path out of the tree objects. In order to
have somewhat consistent inodes (and also for performance when
changing 4 bytes in a 40GB file) you would want to have 3 different
types of "inode" objects:
1) 4-64k of (metadata + filedata)
2) 4-64k of (metadata + list of 4-64k filedata blobs)
3) 4-64k of (metadata + list of 4-64k lists of filedata blobs)
On the other hand... that isn't GIT, it's something completely
different with a very different usage pattern and set of
requirements. And you still don't put the path name in the objects,
just the permissions and other attributes/metadata.
<Random Thought Experiment>
You would of course want to better define those 4-64k limits for
allocation and performance reasons, but a double-indirect table of
SHA128s with 64kb chunks lets you address up to 1TB of file data, and
for each additional power-of-two increase in the chunk size you get 8
times the storage space. Furthermore, the actual double-indirect
tables for an 8TB file using 128k chunks would be all of 64MB, for a
more reasonable 4GB file with 32k tables (max of 128GB) it would be
maybe 128kB of indirect SHA1 hash tables.
</Random Thought Experiment>
Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html