On Wed, 2014-05-14 at 17:36 +0700, Duy Nguyen wrote: > >> With that in > >> mind, I think you don't need to keep a fs cache on disk at all. All > >> you need to store (in the index) is the clock value from watchman. > >> After we parse the index, we perform a "since" query to get path names > >> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit > >> on entries that are _not_ in the query result. The remaining items > >> will be lstat'd by git (see [1] and read-cache.c changes on the next > >> few patches). Assuming the number of updated files is reasonably > >> small, we won't be punished by lookup time. > > > > I considered this, but rejected it for a few reasons: > > 1. We still need to know about the untracked files. I guess we could > > do an index extension for just that, like your untracked cache. > > Yes. But consider that the number of untracked files is usually small > (otherwise 'git status' would look very messy). And your fscache would > need to store excluded file list too, which could be a lot bigger (one > pair of .[ch] -> one .o). _But_ yours would make 'git status > --ignored' work. I don't consider that a major use case for > optimization, but people may have different opinions. I don't think --ignored is a major use case. But I do think it's nice to get as much mileage as possible out of a change like this. > > 2. That doesn't eliminate opendir/readdir, I think. Those are a major > > cost. I did not thoroughly review your patches from January/February, > > but I didn't notice anything in there about this part of dir.c. > > Also the cost of open(nonexistent .gitignore) to do ignore processing. > > Assuming untracked cache is in use, opendir/readdir is replaced with > lstat. And cheap lstat can be solved with watchman without storing > anything extra. I solve open(.gitignore) too by checking its stat data > with the one in index. If matches, I reuse the version in tree. This > does not necessarily make it cheaper, but it increases locality so it > might be. _Modified_ .gitignore files have to go through > open(.gitignore), but people don't update .gitignore often. Interesting -- if all that actually works, then it's probably the right approach. > > 3. Changing a file in the index (e.g. git add) requires clearing > > the CE_VALID bit; this means that third-party libraries (e.g. jgit) > > that change the git repo need to understand this extension for correct > > operation. Maybe that's just the nature of extensions, but it's not > > something that my present patch set requires. > > I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED. > Only after loading and validating against watchman that I turn > CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused. > > I assume you won't change your mind about this. Which is fine to me. I > will still try out my approach with your libwatchman though. Just > curious about its performance and complexity, compared to your > approach. I am open-minded here. This code is really the first time I have looked at git's internals, and I accept that your way might be better. If you're going to try the watchman version of your approach, then we do a direct comparison. Let me know if there is something I can do to help out on that. > A bit off topic, but msys fork has another "fscache" in compat/win32. > If you could come up with a different name, maybe it'll be less > confusing for them after merging. But this is not a big deal, as this > fscache won't work on windows anyway. Does wtcache sounds like a better name? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html