Re: [PATCH v2 0/6] Fast git status via a file system watcher

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>>> Yeah, they could be encoded in the integration script, but it could be
>>>> better if it was possible to just configure a generic command line.
>>>>
>>>> For example if the directory that should be watched for filesystem
>>>> changes could be passed as well as the time since the last changes,
>>>> perhaps only a generic command line would be need.
>>>
>>> Maybe I'm not understanding what you have in mind but I haven't found
>>> this
>>> to be the case in the two integrations I've done with file system
>>> watchers
>>> (one internal and Watchman).  They require you download, install, and
>>> configure them by telling them about the folders you want monitored.
>>> Then
>>> you can start querying them for changes and processing the output to
>>> match
>>> what git expects.  While the download and install steps vary, having that
>>> query + process and return results wrapped up in an integration hook has
>>> worked well.
>>
>> It looks like one can also just ask watchman to monitor a directory with:
>>
>> watchman watch /path/to/dir
>>
>> or:
>>
>> echo '["watch", "/path/to/dir"]' | watchman -j
>>
>> Also for example on Linux people might want to use command line tools
>> like:
>>
>> https://linux.die.net/man/1/inotifywait
>>
>> and you can pass the directories you want to be watched as arguments
>> to this kind of tools.
>>
>> So it would be nice, if we didn't require the user to configure
>> anything and we could just configure the watching of what we need in
>> the hook (or a command configured using a config option). If the hook
>> (or configured command) could be passed the directory by git, it could
>> also be generic.
>
> OK, I think I understand what you're attempting to accomplish now. Often,
> Watchman (and other similar tools) are used to do much more than speed up
> git (in fact, _all_ use cases today are not used for that since this patch
> series hasn't been accepted yet :)).  They trigger builds, run verification
> tools, test passes, or other tasks.

Yeah, but some people might only be interested in installing Watchman
or similar tools to speed up "git status".

> I'm afraid that attempting to have the user configure git to configure the
> tool "automatically" is just adding an extra layer of complexity rather than
> making it simpler.

I think that for the user it makes things simpler, as the user
wouldn't have to configure anything.

For example if the hook does something like the following :

# Check that watchman is already watching the worktree
if ! watchman watch-list | grep "\"$GIT_WORK_TREE\""
then
       # Ask watchman to watch the worktree
       watchman watch "$GIT_WORK_TREE"
       exit 1
fi

# Query Watchman for all the changes since the requested time
echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1,
\"fields\":[\"name\"]}]" | \
watchman -j | \
...

Then users might not need to configure Watchman in the first place,
and if they move their repo they might not need to reconfigure
Watchman.

> I'll leave that to a future patch series to work out.

Yeah, the above improvement can be done later.

>>>> I am also wondering about sparse checkout, as we might want to pass
>>>> all the directories we are interested in.
>>>> How is it supposed to work with sparse checkout?
>>>
>>>
>>> The fsmonitor code works well with or without a sparse-checkout.  The
>>> file
>>> system monitor is unaware of the sparse checkout so will notify git about
>>> any change irrespective of whether git will eventually ignore it because
>>> the
>>> skip worktree bit is set.
>>
>> I was wondering if it could ease the job for the monitoring service
>> and perhaps improve performance to just ask to watch the directories
>> we are interested in when using sparse checkout.
>> On Linux it looks like a separate inotify watch is created for every
>> subdirectory and there is maximum amount of inotify watches per user.
>> This can be increased by writing in
>> /proc/sys/fs/inotify/max_user_watches, but it is not nice to have to
>> ask admins to increase this.
>
> Having a single instance that watches the root of the working directory is
> the simplest model and minimizes use of system resources like inotify as
> there is only one needed per clone.

>From https://linux.die.net/man/1/inotifywait:

-r, --recursive

Watch all subdirectories of any directories passed as arguments.
Watches will be set up recursively to an unlimited depth. Symbolic
links are not traversed. Newly created subdirectories will also be
watched.

Warning: If you use this option while watching the root directory of a
large tree, it may take quite a while until all inotify watches are
established, and events will not be received in this time. Also, since
one inotify watch will be established per subdirectory, it is possible
that the maximum amount of inotify watches per user will be reached.
The default maximum is 8192; it can be increased by writing to
/proc/sys/fs/inotify/max_user_watches.

> In addition, when the sparse-checkout file is modified, there is no need to
> try and automatically update the monitor by adding and removing folders as
> necessary.

Yeah, I agree it is simpler to not have to worry about the
sparse-checkout file being modified.

> Finally, if files or directories are excluded via sparse-checkout, they are
> removed from the working directory at checkout time so don't add any
> additional overhead to the file system watcher anyway as they clearly can't
> generate write events if they don't exist.

Yeah, but if you configure sparse-checkout when you already have
untracked files in many directories, like .o files, these files and
the directories they are in are not removed when you perform a
checkout, so the directories are still watched by the file system
watcher.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]