Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > first run second (cached) run > gentoo-x86 500 ms 71.6 ms > wine 140 ms 9.72 ms > webkit 125 ms 6.88 ms > linux-2.6 106 ms 16.2 ms > > Basically untracked time is cut to one tenth in the best case > scenario. The final numbers would be a bit higher because I haven't > stored or read the cache from index yet. Real commit message follows.. As you allude to later with "if you recompile a single file, the whole hierarchy in that directory is lost", two back-to-back runs of "git status" is not very interesting. > - The list of files and directories of the direction in question > - The $GIT_DIR/index > - The content of $GIT_DIR/info/exclude > - The content of core.excludesfile > - The content (or the lack) of .gitignore of all parent directories > from $GIT_WORK_TREE > > If we can cheaply validate all those inputs for a certain directory, > we are sure that the current code will always produce the same > results, so we can cache and reuse those results. > > This is not a silver bullet approach. When you compile a C file, for > example, the old .o file is removed and a new one with the same name > created, effectively invalidating the containing directory's > cache. But at least with a large enough work tree, there could be many > directories you never touch. The cache could help there. > > The first input can be checked using directory mtime. In many > filesystems, directory mtime is updated when direct files/dirs are > added or removed (*). An important thing is that creation of new cruft or deletion of existing cruft can be detected without any false negative with the mechanism, and mtime on directory would be a good way to check it. > The second one can be hooked from read-cache.c. Whenever a file (or a > submodule) is added or removed from a directory, we invalidate that > directory. This will be done in a later patch. I would imagine that it would be done at the same places as we invalidate cache-trees, with the same "invalidation percolates up" logic. > On subsequent runs, read_directory_recursive() reads stat info of the > directory in question and verifies if files/dirs have been added or > removed. Hmph. If you have a two-level hierarchy D1/D2 and you change the list of crufts in D2 but not in D1, the mtime of D1/D2 changes but not the mtime of D1, as you observed below. > With the help of prep_exclude() to verify .gitignore chain, > it may decide "all is well" and enable the fast path in > treat_path(). read_directory_recursive() is still called for > subdirectories even in fast path, because a directory mtime does not > cover all subdirs recursively. I wonder if you can avoid recursing into D1 when no cached mtime (and .gitignore) information has changed in any subdirectory of it (e.g. both D1 and D1/D2 match the cache). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html