On Sun, 20 Apr 2008, Luciano Rocha wrote: > > That's a lot. Why not use a stat cache? Well, the thing is, the OS _does_ a stat cache for us, and the one that the OS maintains is a lot better, in that it works across processes and is coherent with other processes changing things. And the thing is, your stat cache makes the *common* cases slower. I didn't do a whole lot of testing, but on my machine, doing just a "git status" with and without your stat cache shows Current git 'master': real 0m0.302s real 0m0.308s real 0m0.314s With your patch: real 0m0.352s real 0m0.354s real 0m0.355s iow, it slowed down the case that I think matters more (the one you're *supposed* to use, and people most commonly do) by 15%. Now, admittedly, I also do think that we should generally optimize the slow cases more than we should care about things that are already very fast, so I do not think that it's wrong to say "ok, let's make the really fast case a bit slower, in order to not be so slow in the bad case", so in that sense I do not think the slowdown is disastrous. BUT. I really dislike adding a cache that is there just because we do something stupid. We can fix the over-abundance of lstat() calls by just being smarter. And the smarter we are, the less the cache will help, and the more it will hurt. Which is the real reason why I think the cache is a really really bad idea: it optimizes for the wrong kind of behavior. So we have other caches and hashes we use, like the index itself, or the name lookup hash into the index, or the delta cache. Maintaining those caches takes some effort too, but those caches aren't there because we're doing something stupid, they are there because they allow us to do something smart. For example, the index itself actually has really important semantic characteristics. And while the name hashing actually improves on index lookup performance, I'd never have implemented it if it wasn't for the fact that it was also designed to allow us to do case-insensitive lookups. And the delta cache is not hiding stupidity, it's literally avoiding very expensive work that we can't avoid by being smarter. So the stat cache is not horribly bad, but I think it's the wrong path to go down. > With these changes, my git status . in WebKit changes from 28.215s to > 15.414s. Of course, one reason I don't think it's such a great idea is that on Linux, your stat cache doesn't even then end up helping _nearly_ as much as it does on OS X. You see an almost 50% improvement, so the 15% *deprovement* may not sound like much to you. But under Linux, the numbers are quite different: "git status ." with your patch: real 0m1.043s real 0m1.009s real 0m0.972s With my trivial patch that just removed 2 of the 9 lstat calls: real 0m1.116s real 0m1.115s real 0m1.119s IOW, it does help the "." case on Linux, but only by a fairly small amount. In fact, the improvement seems slightly smaller than the peformance degradation (~12% vs ~15%), but that is probably within the margin of noise, so... So another reason to avoid the stat cache is that it's really just working around an OS X deficiency. I'd rather work at avoiding more lstat calls. I know we can do it. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html