Duy Nguyen <pclouds@xxxxxxxxx> writes: > I wonder if we need to update_main_cache_tree() so many times in this > function. If I read the code correctly, all roads must lead to > update_main_cache_tree(0) in prepare_to_commit(). I think prepare-to-commit is too late; it does not want to know if the index it was given to create a tree out of is the one that the user will keep using after this invocation of "git commit" or just a temporary one used for partial commit. The cache-tree rebuilt there is what is recorded with commit_tree_extended() in cmd_commit(), but if you are doing a partial commit, these generic code paths are given a temporary index file on the filesystem to work on, and cache-tree in that will not help user's later operation. For three main uses of "git commit", prepare_index() does these: - "git commit -a" and "git commit -i $paths" update the index with the new contents from the working tree, fully builds cache-tree in-core to write out the tree objects, and writes the index file to the filesystem. Because this index is the one used after this invocation of "git commit" returns, we have a fully populated cache-tree after this happens. This code path is perfect and there is no need to change. - "git commit" commits the contents of the index as-is, so technically there is no reason for it to update the index on the filesystem at all, but it refreshes the cached stat data to help the "status" part of the command, and if it finds that stat data was stale, updates the index on the filesystem because it is wasteful not to do so. As we would be spending I/O cycles to update the index file in that case anyway, we also rebuild the cache-tree and include that in the updated index. When the cached stat data was already up-to-date, however, we do not update the index on the filesystem, so the series under discussion will change the trade-off by doing one more I/O to write out a new index to the filesystem only to update cache-tree. - "git commit $paths" updates the "real" index with given $paths and writes it out to the filesystem first. This is the index the user will use after "git commit" finishes; traditionally our trade-off was "populate cache-tree only when we do not have to spend any cycle only to do so (i.e. when we are writing trees anyway, or when we read from a tree), and let it degrade as paths are added, removed and modified" and we avoided populating cache-tree in this codepath. The series under discussion will change the trade-off here, too. After it updates this "real" index, it builds another temporary index that represents the tree state to be committed (starting from HEAD and updates with the given $paths), but that will be discarded and we do not want to spend any extra cycle to do anything only to make its later use more efficient (like writing updated cache-tree to it). > If we find out that > we have incomplete cache-tree before that call, we could write the > index one more time after that call, and that will make an extra I/O only to write out cache-tree to the temporary index that we will discard immediately after for a partial commit. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html