Re: [PATCH v2 0/6] Fast git status via a file system watcher

Ben Peart <peartben@xxxxxxxxx> · Tue, 30 May 2017 19:11:25 -0400

On 5/30/2017 4:33 PM, Christian Couder wrote:
On Tue, May 30, 2017 at 8:05 PM, Ben Peart <peartben@xxxxxxxxx> wrote:

On 5/27/2017 2:57 AM, Christian Couder wrote:

On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@xxxxxxxxx> wrote:

On 5/24/2017 6:54 AM, Christian Couder wrote:

Design
~~~~~~

A new git hook (query-fsmonitor) must exist and be enabled
(core.fsmonitor=true) that takes a time_t formatted as a string and
outputs to stdout all files that have been modified since the requested
time.

Is there a reason why there is a new hook, instead of a
"core.fsmonitorquery" config option to which you could pass whatever
command line with options?

A hook is a simple and well defined way to integrate git with another
process.  If there is some fixed set of arguments that need to be passed
to
a file system monitor (beyond the timestamp stored in the index
extension),
they can be encoded in the integration script like I've done in the
Watchman
integration sample hook.

Yeah, they could be encoded in the integration script, but it could be
better if it was possible to just configure a generic command line.

For example if the directory that should be watched for filesystem
changes could be passed as well as the time since the last changes,
perhaps only a generic command line would be need.

Maybe I'm not understanding what you have in mind but I haven't found this
to be the case in the two integrations I've done with file system watchers
(one internal and Watchman).  They require you download, install, and
configure them by telling them about the folders you want monitored.  Then
you can start querying them for changes and processing the output to match
what git expects.  While the download and install steps vary, having that
query + process and return results wrapped up in an integration hook has
worked well.

It looks like one can also just ask watchman to monitor a directory with:

watchman watch /path/to/dir

or:

echo '["watch", "/path/to/dir"]' | watchman -j

Also for example on Linux people might want to use command line tools like:

https://linux.die.net/man/1/inotifywait

and you can pass the directories you want to be watched as arguments
to this kind of tools.

So it would be nice, if we didn't require the user to configure
anything and we could just configure the watching of what we need in
the hook (or a command configured using a config option). If the hook
(or configured command) could be passed the directory by git, it could
also be generic.

OK, I think I understand what you're attempting to accomplish now. 
Often, Watchman (and other similar tools) are used to do much more than 
speed up git (in fact, _all_ use cases today are not used for that since 
this patch series hasn't been accepted yet :)).  They trigger builds, 
run verification tools, test passes, or other tasks.

I'm afraid that attempting to have the user configure git to configure 
the tool "automatically" is just adding an extra layer of complexity 
rather than making it simpler.  I'll leave that to a future patch series 
to work out.

I am also wondering about sparse checkout, as we might want to pass
all the directories we are interested in.
How is it supposed to work with sparse checkout?

The fsmonitor code works well with or without a sparse-checkout.  The file
system monitor is unaware of the sparse checkout so will notify git about
any change irrespective of whether git will eventually ignore it because the
skip worktree bit is set.

I was wondering if it could ease the job for the monitoring service
and perhaps improve performance to just ask to watch the directories
we are interested in when using sparse checkout.
On Linux it looks like a separate inotify watch is created for every
subdirectory and there is maximum amount of inotify watches per user.
This can be increased by writing in
/proc/sys/fs/inotify/max_user_watches, but it is not nice to have to
ask admins to increase this.

Having a single instance that watches the root of the working directory 
is the simplest model and minimizes use of system resources like inotify 
as there is only one needed per clone.

In addition, when the sparse-checkout file is modified, there is no need 
to try and automatically update the monitor by adding and removing 
folders as necessary.

Finally, if files or directories are excluded via sparse-checkout, they 
are removed from the working directory at checkout time so don't add any 
additional overhead to the file system watcher anyway as they clearly 
can't generate write events if they don't exist.