On Tue, 27 Feb 2007, Linus Torvalds wrote: > > > On Tue, 27 Feb 2007, Shawn O. Pearce wrote: > > > > We have thus far reformatted OBJ_TREEs with a new dictionary based > > compression scheme. In this scheme we pool the filenames and modes > > that appear within trees into a single table within the packfile. > > All trees are then converted to use a 22 byte record format: > > > > - 2 byte network byte order index into the string pool > > - 20 byte SHA-1 > > Umm. Am I missing something, or is this totally braindamaged? > > Are you really expecting there to never be more than 64k basenames? Trust > me, that's a totally broken assumption. Anything that tracks generated > stuff will _easily_ have several tens of thousands of random filenames > even in a single tree, much less over the whole history of the repository. The idea is to deal with only tree objects containing the 64K most frequently used base names and fall back to the current tree object encoding for objects that couldn't be represented that way. For reference the GIT tree itself has 585 unique names. The Linux kernel has 12263 of them. If we eventually find it is common and performance critical to have more bits to represent those indices because the number of unique path components far exceeds that limit with an even distribution then we might just add another tree encoding with a 3-byte index for those. In the end everything translates back to the same object. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html