Re: Watchman support for git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2014-05-10 at 15:16 +0700, Duy Nguyen wrote:
> On Sat, May 3, 2014 at 6:14 AM,  <dturner@xxxxxxxxxxxxxxxx> wrote:
> > The most sigificant patch uses Facebook's watchman daemon[1] to monitor
> > the repository work tree for changes.  This makes allows git status
> > to avoid traversing the entire work tree to find changes.
> 
> Some comments on this series. I still haven't been able to run it with
> watchman so there are many guesses from my side.
> 
> First of all, when I set USE_WATCHMAN=Yes in config.mak I expect it to
> work out of the box, provided that all dependencies are installed. I
> still need to set WATCHMAN_LIBS for it to build because you only set
> it with configure script. Would be nice to have a default value for
> non-configure people too.

I'll fix that, thanks.

> I'm not so happy that git now needs to link to libjansson.so and
> libwatchman.so. I would load libwatchman.so at runtime only when
> core.usewatchman is on, but this is more of personal taste.

I assume you mean something with dlopen? I don't really like that because
(a) nothing else in git works that way and
(b) you would still need the libwatchman headers to build git (or the
structs could be copied, but that is ugly).

> I still prefer my old tracking model, where the majority of lstat() is
> done by refresh operation and we only need to optimize those lstat
> calls, not every single lstat statement in the code base.

The lstat calls are only one of the problems.  The others are
opendir/readdir and open(.gitignore). 

>  With that in
> mind, I think you don't need to keep a fs cache on disk at all. All
> you need to store (in the index) is the clock value from watchman.
> After we parse the index, we perform a "since" query to get path names
> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit
> on entries that are _not_ in the query result. The remaining items
> will be lstat'd by git (see [1] and read-cache.c changes on the next
> few patches). Assuming the number of updated files is reasonably
> small, we won't  be punished by lookup time. 

I considered this, but rejected it for a few reasons:
1. We still need to know about the untracked files.  I guess we could 
do an index extension for just that, like your untracked cache.

2. That doesn't eliminate opendir/readdir, I think.  Those are a major 
cost. I did not thoroughly review your patches from January/February, 
but I didn't notice anything in there about this part of dir.c.  
Also the cost of open(nonexistent .gitignore) to do ignore processing.

3. Changing a file in the index (e.g. git add) requires clearing
the CE_VALID bit; this means that third-party libraries (e.g. jgit) 
that change the git repo need to understand this extension for correct
operation.  Maybe that's just the nature of extensions, but it's not
something that my present patch set requires.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]