On Wed, May 14, 2014 at 6:44 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote: >> I'm not so happy that git now needs to link to libjansson.so and >> libwatchman.so. I would load libwatchman.so at runtime only when >> core.usewatchman is on, but this is more of personal taste. > > I assume you mean something with dlopen? I don't really like that because > (a) nothing else in git works that way and > (b) you would still need the libwatchman headers to build git (or the > structs could be copied, but that is ugly). And because we don't link to libdl.so anyway, using dlopen means trading libjansson.so and libwatchman.so for libdl.so. And it's uglier. So forget what I wrote. >> With that in >> mind, I think you don't need to keep a fs cache on disk at all. All >> you need to store (in the index) is the clock value from watchman. >> After we parse the index, we perform a "since" query to get path names >> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit >> on entries that are _not_ in the query result. The remaining items >> will be lstat'd by git (see [1] and read-cache.c changes on the next >> few patches). Assuming the number of updated files is reasonably >> small, we won't be punished by lookup time. > > I considered this, but rejected it for a few reasons: > 1. We still need to know about the untracked files. I guess we could > do an index extension for just that, like your untracked cache. Yes. But consider that the number of untracked files is usually small (otherwise 'git status' would look very messy). And your fscache would need to store excluded file list too, which could be a lot bigger (one pair of .[ch] -> one .o). _But_ yours would make 'git status --ignored' work. I don't consider that a major use case for optimization, but people may have different opinions. > 2. That doesn't eliminate opendir/readdir, I think. Those are a major > cost. I did not thoroughly review your patches from January/February, > but I didn't notice anything in there about this part of dir.c. > Also the cost of open(nonexistent .gitignore) to do ignore processing. Assuming untracked cache is in use, opendir/readdir is replaced with lstat. And cheap lstat can be solved with watchman without storing anything extra. I solve open(.gitignore) too by checking its stat data with the one in index. If matches, I reuse the version in tree. This does not necessarily make it cheaper, but it increases locality so it might be. _Modified_ .gitignore files have to go through open(.gitignore), but people don't update .gitignore often. > 3. Changing a file in the index (e.g. git add) requires clearing > the CE_VALID bit; this means that third-party libraries (e.g. jgit) > that change the git repo need to understand this extension for correct > operation. Maybe that's just the nature of extensions, but it's not > something that my present patch set requires. I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED. Only after loading and validating against watchman that I turn CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused. I assume you won't change your mind about this. Which is fine to me. I will still try out my approach with your libwatchman though. Just curious about its performance and complexity, compared to your approach. A bit off topic, but msys fork has another "fscache" in compat/win32. If you could come up with a different name, maybe it'll be less confusing for them after merging. But this is not a big deal, as this fscache won't work on windows anyway. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html