On Sun, 20 Apr 2008, Linus Torvalds wrote: > > I do agree that actually actively removing stat calls requires a lot more > subtle interactions. We almost always *have* the stat information in the > index, but the problem with "git status ." is that we re-read the index so > many times (and then have to re-validate the stat info). Actually, looking closer, one of the issues seems to be not just the fact that we throw out the index by re-reading it, but run_diff_files() does ... if (ce_uptodate(ce)) continue; changed = check_work_tree_entity(ce, &st, symcache); if (changed) { ... where that "check_work_tree_entity()" check is very expensive for deep directory structures, because it ends up checking the stat() information fo every single directory leading up to it. There's some bug there, because it really shouldn't do that. This causes lstat() patterns like .. lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.2-2.js", {st_mode=S_IFREG|0664, st_size=3197, ...}) = 0 lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 .. ie instead of doing just *one* lstat (on that file), it does six: the file itself, and the five directories leading up to it! This is the *real* cause of WebKit having ~7 lstat's per file in the repository - if it wasn't for this braindamage, we'd have just three lstat's per file for "git status .". What's really sad is how we do this for every file in a directory, so the pattern actually ends up looking like ... lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.1.js", {st_mode=S_IFREG|0664, st_size=2164, ...}) = 0 lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.2-1.js", {st_mode=S_IFREG|0664, st_size=5219, ...}) = 0 lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.2-2.js", {st_mode=S_IFREG|0664, st_size=3197, ...}) = 0 lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 ... ie for deep directories with lots of files in them, we end up doing an lstat() on all the directories leading up to that directory oevr and over and over again - for each file in that directory. Oops. We're supposed to have that "char *symcache" thing to not do that, but it doesn't actually work that way. Junio, what was the logic for that whole "has_symlink_leading_path()" thing? I forget. Whatever, it's broken. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html