Re: [PATCH 01/02/RFC] implement a stat cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 20 Apr 2008, Linus Torvalds wrote:
> 
> I do agree that actually actively removing stat calls requires a lot more 
> subtle interactions. We almost always *have* the stat information in the 
> index, but the problem with "git status ." is that we re-read the index so 
> many times (and then have to re-validate the stat info).

Actually, looking closer, one of the issues seems to be not just the fact 
that we throw out the index by re-reading it, but run_diff_files() does

		...
                if (ce_uptodate(ce))
                        continue;

                changed = check_work_tree_entity(ce, &st, symcache);
                if (changed) {
			...

where that "check_work_tree_entity()" check is very expensive for deep 
directory structures, because it ends up checking the stat() information 
fo every single directory leading up to it.

There's some bug there, because it really shouldn't do that.

This causes lstat() patterns like

	..
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.2-2.js", {st_mode=S_IFREG|0664, st_size=3197, ...}) = 0
	lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	..

ie instead of doing just *one* lstat (on that file), it does six: the file 
itself, and the five directories leading up to it!

This is the *real* cause of WebKit having ~7 lstat's per file in the 
repository - if it wasn't for this braindamage, we'd have just three 
lstat's per file for "git status .".

What's really sad is how we do this for every file in a directory, so the 
pattern actually ends up looking like

	...
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.1.js", {st_mode=S_IFREG|0664, st_size=2164, ...}) = 0
	lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.2-1.js", {st_mode=S_IFREG|0664, st_size=5219, ...}) = 0
	lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean/15.6.4.2-2.js", {st_mode=S_IFREG|0664, st_size=3197, ...}) = 0
	lstat("JavaScriptCore", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	lstat("JavaScriptCore/tests/mozilla/ecma/Boolean", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	...

ie for deep directories with lots of files in them, we end up doing an 
lstat() on all the directories leading up to that directory oevr and over 
and over again - for each file in that directory.

Oops.

We're supposed to have that "char *symcache" thing to not do that, but it 
doesn't actually work that way.

Junio, what was the logic for that whole "has_symlink_leading_path()" 
thing? I forget. Whatever, it's broken. 

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux