On Wed, Mar 21, 2012 at 6:10 AM, elton sky <eltonsky9404@xxxxxxxxx> wrote: > From the idea, I realize the problem is that index is verified and > rewritten on any operations which is unnecessary sometimes. And the > objective is to reduce the number of operations to below logN. As I > am new to git, I I couldn't give a detailed plan to this for now. I > should have gonna through more documents or codes but there's only one > week for application. So I have to jump up from nowhere :P Understanding current index format would be a good start, I think: Documentation/technical/index-format.txt. For reading index code, look at read_index_from() in read-cache.c (many if not all index manipulation are in this file) > I got questions like: how each operations affect index? For writing part, commands that call refresh_index() can update stat info for many many entries. git-add, git-update, git-mv and git-rm can add/remove entries from the index. Merge/checkout oeprations (git-reset, git-checkout, git-merge..) can rewrite the whole index. I think this proposal aims to speed up refresh_index and add/remove operations, not the last one. To speed up reading part (you can grep read_cache() to see how many commands read index), you may need to do something with index integrity check. Currently it calculates SHA-1 of the entire index, then checks against the stored value at the end of index. Calculating SHA-1 can be really expensive on big index. > how cache tree data and index is stored? Cache tree is stored as an optional index extension. It's also documented in index-format.txt. Or you can look at cache-tree.[ch] -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html