Re: [PATCH v3 29/34] fsmonitor--daemon: use a cookie file to sync with file system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
>
> Teach fsmonitor--daemon client threads to create a cookie file
> inside the .git directory and then wait until FS events for the
> cookie are observed by the FS listener thread.
>
> This helps address the racy nature of file system events by
> blocking the client response until the kernel has drained any
> event backlog.
>
> This is especially important on MacOS where kernel events are
> only issued with a limited frequency.  See the `latency` argument
> of `FSeventStreamCreate()`.  The kernel only signals every `latency`
> seconds, but does not guarantee that the kernel queue is completely
> drained, so we may have to wait more than one interval.  If we
> increase the frequency, the system is more likely to drop events.
> We avoid these issues by having each client thread create a unique
> cookie file and then wait until it is seen in the event stream.

Is this a guaranteed property of any API fsmonitor might need to work
with (linux, darwin, Windows) that if I perform a bunch of FS operations
on my working tree, that if I finish up by touching this cookie file
that that'll happen last?

I'd think that wouldn't be the case, i.e. on POSIX filesystems unless
you run around fsyncing both files and directories you're not guaranteed
that they're on disk, and even then the kernel might decide to sync your
cookie earlier, won't it?

E.g. on Linux you can even have cross-FS watches, and mix & match
different FS types. I'd expect to get events in whatever
implementation-defined order the VFS layer + FS decided to sync them to
disk in & get to firing off an event for me.

Or do these APIs all guarantee that a linear view of the world is
presented to the API consumer?


> Co-authored-by: Kevin Willford <Kevin.Willford@xxxxxxxxxxxxx>
> Co-authored-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> ---
>  builtin/fsmonitor--daemon.c | 228 +++++++++++++++++++++++++++++++++++-
>  fsmonitor--daemon.h         |   5 +
>  2 files changed, 232 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c
> index 8249420ba18..25f18f2726b 100644
> --- a/builtin/fsmonitor--daemon.c
> +++ b/builtin/fsmonitor--daemon.c
> @@ -94,6 +94,149 @@ static int do_as_client__status(void)
>  	}
>  }
>  
> +enum fsmonitor_cookie_item_result {
> +	FCIR_ERROR = -1, /* could not create cookie file ? */
> +	FCIR_INIT = 0,
> +	FCIR_SEEN,
> +	FCIR_ABORT,
> +};
> +
> +struct fsmonitor_cookie_item {
> +	struct hashmap_entry entry;
> +	const char *name;
> +	enum fsmonitor_cookie_item_result result;
> +};
> +
> +static int cookies_cmp(const void *data, const struct hashmap_entry *he1,
> +		     const struct hashmap_entry *he2, const void *keydata)
> +{
> +	const struct fsmonitor_cookie_item *a =
> +		container_of(he1, const struct fsmonitor_cookie_item, entry);
> +	const struct fsmonitor_cookie_item *b =
> +		container_of(he2, const struct fsmonitor_cookie_item, entry);

Re earlier comments about verbose names, these are just enums in a
builtin/*.c file, so a name like "cookie_item" is fine, and then the
whole thing might even fit on one line.. :)

> [...]
> +	/*
> +	 * We will write filesystem syncing cookie files into
> +	 * <gitdir>/<fsmonitor-dir>/<cookie-dir>/<pid>-<seq>.
> +	 *
> +	 * The extra layers of subdirectories here keep us from
> +	 * changing the mtime on ".git/" or ".git/foo/" when we create
> +	 * or delete cookie files.
> +	 *
> +	 * There have been problems with some IDEs that do a
> +	 * non-recursive watch of the ".git/" directory and run a
> +	 * series of commands any time something happens.
> +	 *
> +	 * For example, if we place our cookie files directly in
> +	 * ".git/" or ".git/foo/" then a `git status` (or similar
> +	 * command) from the IDE will cause a cookie file to be
> +	 * created in one of those dirs.  This causes the mtime of
> +	 * those dirs to change.  This triggers the IDE's watch
> +	 * notification.  This triggers the IDE to run those commands
> +	 * again.  And the process repeats and the machine never goes
> +	 * idle.
> +	 *
> +	 * Adding the extra layers of subdirectories prevents the
> +	 * mtime of ".git/" and ".git/foo" from changing when a
> +	 * cookie file is created.
> +	 */
> +	strbuf_init(&state.path_cookie_prefix, 0);
> +	strbuf_addbuf(&state.path_cookie_prefix, &state.path_gitdir_watch);
> +
> +	strbuf_addch(&state.path_cookie_prefix, '/');
> +	strbuf_addstr(&state.path_cookie_prefix, FSMONITOR_DIR);
> +	mkdir(state.path_cookie_prefix.buf, 0777);
> +
> +	strbuf_addch(&state.path_cookie_prefix, '/');
> +	strbuf_addstr(&state.path_cookie_prefix, FSMONITOR_COOKIE_DIR);
> +	mkdir(state.path_cookie_prefix.buf, 0777);
> +
> +	strbuf_addch(&state.path_cookie_prefix, '/');

So, on some stupid IDEs (would be nice to have specifics in the commit
message, which ones/versions?) this avoids causing infinite activity,
but on slightly more industrious stupid IDEs that would do their own
recursive watch we'll have the same problem?

Perhaps we should just consider creating it at the top-level and those
IDEs will eventually sort out their bugs, sooner than later if this
feature ships...

> +	strbuf_release(&state.path_cookie_prefix);
> +
> +	/*
> +	 * NEEDSWORK: Consider "rm -rf <gitdir>/<fsmonitor-dir>"
> +	 */

That would also make this trivial, presumably it's a "needswork" since
you have this recursive structure, but if it's at the top-level we
already did the unlink() above, so NEEDSWORK solved then?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux