Re: Watchman support for git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 14, 2014 at 6:44 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote:
>> I'm not so happy that git now needs to link to libjansson.so and
>> libwatchman.so. I would load libwatchman.so at runtime only when
>> core.usewatchman is on, but this is more of personal taste.
>
> I assume you mean something with dlopen? I don't really like that because
> (a) nothing else in git works that way and
> (b) you would still need the libwatchman headers to build git (or the
> structs could be copied, but that is ugly).

And because we don't link to libdl.so anyway, using dlopen means
trading libjansson.so and libwatchman.so for libdl.so. And it's
uglier. So forget what I wrote.

>>  With that in
>> mind, I think you don't need to keep a fs cache on disk at all. All
>> you need to store (in the index) is the clock value from watchman.
>> After we parse the index, we perform a "since" query to get path names
>> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit
>> on entries that are _not_ in the query result. The remaining items
>> will be lstat'd by git (see [1] and read-cache.c changes on the next
>> few patches). Assuming the number of updated files is reasonably
>> small, we won't  be punished by lookup time.
>
> I considered this, but rejected it for a few reasons:
> 1. We still need to know about the untracked files.  I guess we could
> do an index extension for just that, like your untracked cache.

Yes. But consider that the number of untracked files is usually small
(otherwise 'git status' would look very messy). And your fscache would
need to store excluded file list too, which could be a lot bigger (one
pair of  .[ch] -> one .o). _But_ yours would make 'git status
--ignored' work. I don't consider that a major use case for
optimization, but people may have different opinions.

> 2. That doesn't eliminate opendir/readdir, I think.  Those are a major
> cost. I did not thoroughly review your patches from January/February,
> but I didn't notice anything in there about this part of dir.c.
> Also the cost of open(nonexistent .gitignore) to do ignore processing.

Assuming untracked cache is in use, opendir/readdir is replaced with
lstat. And cheap lstat can be solved with watchman without storing
anything extra. I solve open(.gitignore) too by checking its stat data
with the one in index. If matches, I reuse the version in tree. This
does not necessarily make it cheaper, but it increases locality so it
might be. _Modified_ .gitignore files have to go through
open(.gitignore), but people don't update .gitignore often.

> 3. Changing a file in the index (e.g. git add) requires clearing
> the CE_VALID bit; this means that third-party libraries (e.g. jgit)
> that change the git repo need to understand this extension for correct
> operation.  Maybe that's just the nature of extensions, but it's not
> something that my present patch set requires.

I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED.
Only after loading and validating against watchman that I turn
CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused.

I assume you won't change your mind about this. Which is fine to me. I
will still try out my approach with your libwatchman though. Just
curious about its performance and complexity, compared to your
approach.

A bit off topic, but msys fork has another "fscache" in compat/win32.
If you could come up with a different name, maybe it'll be less
confusing for them after merging. But this is not a big deal, as this
fscache won't work on windows anyway.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]