First of all, thanks for pointing to many more big repos. I'll look at them tomorrow. End-of-day report (or ranting :D) time. The series now looks good enough for public eyes. I haven't run the test suite with untracked cache on by default so confidence level is not so high. Although I suspect racy timestamp issue will practically disable the cache anyway. The idea is as before, exploit directory mtime to cache untracked files. MSDN tells me NTFS on Windows exposes the "good" dir mtime behavior, which means this series could speed up Git on Windows (I think Karsten fscache only deals with slow lstat, untracked files..). It would be nice if Windows people could try and confirm this. This could be a good point for untracked cache vs watchman (no windows support, last time I checked). Usage is very simple, "git update-index --untracked-cache" and you're ready. Do --no-untracked-cache to revert back. The peformance numbers on webkit look good. If we focus on read_directory time only. Normally it takes 890ms. The first run with untracked cache goes up to 922ms (filling up the cache, not counting index write time). The next run goes down to 184ms (best case). The gain is about 80%. lstat costs on directories only about 20ms out of that 184ms, so I still need to see if I can lower that number further down. "git status" performance gain is less impressive of course. Only about 38% with refresh time now becomes the top offender. With core.preloadindex on, the gain increases to 50%. There's still room for improvement to maybe make it to 65% by reducing read time, I think. But again we may not stay in the best case forever. The more dirs are damaged, the slower it gets. At the end of the spectrum, all dirs are damanged, we gain nothing but overhead. This is actually when watchman shines, although projects that do that may need some improvements. Another bad point for untracked cache is, the extension data is so specifiec to core git algorithm that it probably cannot be reused by other implementations. Again, watchman shines here. Last note, this series conflicts with split-index due to the write_cache API change, so not a candidate for 'pu' yet. The series could also be fetched from https://github.com/pclouds/git/commits/untracked-cache except the last few timing/experimental patches. Nguyễn Thái Ngọc Duy (20): dir.c: coding style fix dir.h: move struct exclude declaration to top level prep_exclude: remove the artificial PATH_MAX limit dir.c: optionally compute sha-1 of a .gitignore file untracked cache: record .gitignore information and dir hierarchy untracked cache: initial untracked cache validation untracked cache: invalidate dirs recursively if .gitignore changes untracked cache: record/validate dir mtime and reuse cached output untracked cache: mark what dirs should be recursed/saved untracked cache: don't open non-existent .gitignore untracked cache: save to an index extension untracked cache: load from UNTR index extension untracked cache: invalidate at index addition or removal untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED read-cache.c: split racy stat test to a separate function untracked cache: avoid racy timestamps status: support untracked cache update-index: manually enable or disable untracked cache update-index: test the system before enabling untracked cache t7063: tests for untracked cache .gitignore | 1 + Makefile | 1 + builtin/commit.c | 8 + builtin/update-index.c | 161 ++++++ cache.h | 5 + dir.c | 853 +++++++++++++++++++++++++++-- dir.h | 120 +++- read-cache.c | 51 +- t/t7063-status-untracked-cache.sh (new +x) | 352 ++++++++++++ test-dump-untracked-cache.c (new) | 61 +++ unpack-trees.c | 7 +- wt-status.c | 6 + 12 files changed, 1537 insertions(+), 89 deletions(-) create mode 100755 t/t7063-status-untracked-cache.sh create mode 100644 test-dump-untracked-cache.c -- 1.9.1.346.ga2b5940 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html