Junio C Hamano <junkio@xxxxxxx> writes: > (1) When git-write-tree writes out trees from the index, we > store <directory, tree SHA1> pair for all the trees we > compute, in .git/index.aux file. There are two reasons I did not make this extra information part of the index file. The number one reason was because the current index file format is pretty dense, and I did not find an obvious hole in the front, in the middle or at the tail to sneak extra data in without upsetting existing code and without updating index file version. If this were a change to add a great new feature, that might have warranted bumping the version up, but the cache-tree is an optimization and if you lose that information, all you have lost is that now your write-tree needs to recompute the whole tree as before (IOW, not much). The second reason was I wanted to do this step-by-step, and wanted to do a demonstration of an end-to-end workflow that gains from this set of changes (apply followed by write-tree was an obvious minimum set of commands), and while doing so I did not have to disrupt other commands that are unaware of this extension. Having said that, cache-tree.c has an internal interface to serialize the cache-tree data in-core, primarily because I was unsure if I wanted to append this information somewhere inside the main index file or have it external when I did it; so we could later push it into the index if we wanted to. Ideally, everybody who writes index should be converted to update cache-tree to help eventual write-tree. Right now, if a command updates index without updating a matching cache-tree, the whole thing is invalidated; this way, you do not risk using stale "cached" data. Currently the command that primes the cache-tree is write-tree. This may be counterintuitive -- even I myself would expect that read-tree would prime it, and various index-updaters invalidate subtrees they touch, and write-tree to use the surviving parts to speed up what it needs to do, and write out an updated, fully-vaild cache-tree. I did not do it only because it was not necessary for "apply then write-tree" cycle, but read-tree should be taught about cache-tree to help others, _and_ to help itself. When "read-tree -m O A B" merges three trees, we iterate over all index entries, even when only a small part of the tree has changed. This could be helped in a big way if the current index has valid cache-tree information for the parts unaffected by the merge. If all three trees have identical tree in higher level subdirectory (e.g. "fs/" in the kernel source), and if the index has not touched anything in "fs/" since it read from our tree (i.e. "A"), then we do not even have to descend into that directory in the working tree to see which index entries are dirty. We can just keep index entries that begin with "fs/" intact, keep the cache-tree entry for that directory as it was read from "A". This would be a big win -- we do not have to read tree objects under "fs/" (there are 62 trees under "fs/", so we save uncompressing and undeltifying 180 objects). This operation would need to invalidate entries in cache-tree that are involved in the actual merge. Branch-switching two-tree form "read-tree -m OLD NEW" probably can benefit from the same optimization. Obviously, a single-tree form of read-tree should be able to prime the cache-tree with fully valid data before writing the index out. Having said that, I am not touching read-tree myself for now; I am lazy. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html