Re: [PATCH/WIP v2 00/14] inotify support

Duy Nguyen <pclouds@xxxxxxxxx> · Mon, 20 Jan 2014 08:28:26 +0700

On Mon, Jan 20, 2014 at 12:04 AM, Thomas Rast <tr@xxxxxxxxxxxxx> wrote:
> I never got the last three patches, did you send them?
>
> Also, this doesn't cleanly apply anywhere at my end.  Can you push it
> somewhere for easier experimentation?

Sorry I rebased it on top of kb/fast-hashmap but never got around to
actually using hash tables. The code is here (and on top of master)

https://github.com/pclouds/git.git file-watcher

> So I think the watcher protocol can be specified roughly as:
>
>   Definitions: The watcher is a separate process that maintains a set of
>   _watched paths_.  Git uses the commands below to add or remove paths
>   from this set.
>
>   In addition it maintains a set of _changed paths_, which is a subset
>   of the watched paths.  Git occasionally queries the changed paths.  If
>   at any time after a path P was added to the "watched" set, P has or
>   may have changed, it MUST be added to the "changed" set.
>
>   Note that a path being "unchanged" under this definition does NOT mean
>   that it is unchanged in the index as per `git diff`.  It may have been
>   changed before the watch was issued!  Therefore, Git MUST test whether
>   the path started out unchanged using lstat() AFTER it was added to the
>   "watched" set.

Correct.

>   Handshake::
>     On connecting, git sends "hello <magic>".  <magic> is an opaque
>     string that identifies the state of the index.  The watcher MUST
>     respond with "hello <previous_magic>", where previous_magic is
>     obtained from the "bye" command (below).  If <magic> !=
>     <previous_magic>, the watcher MUST reset state for this repository.

In addition, git must reset itself (i.e. clear all CE_WATCHED flags)
because nothing can be trusted at this point.

...
>
> Did I get that approximately straight?  Perhaps you can fix up as needed
> and then turn it into the documentation for the protocol.

Will do (and probably steal some of your text above).

> There are several points about this that I don't like:
>
> * What does the datagram socket buy us?  You already do a lot of
>   pkt-line work for the really big messages, so why not just use
>   pkt-line everywhere and a streaming socket?

With datagram sockets I did not have to maintain the connected
sockets, which made it somewhat simpler to handle so far.

The default SO_SNDBUF goes up to 212k, so we can send up to that
amount without blocking. With current pkt-line we send up to 64k in
"watch" message then we have to wait for "fine", which results in more
context switches. But I think we can extend pkt-line's length field to
5 hex digit to cover this.

Streaming sockets are probably the way to go for per-user daemon, so
we can identify a socket with a repo.

> * The handshake should have room for capabilities, so that we can extend
>   the protocol.

Yeah. One more point for streaming sockets. With datagram sockets it's
harder to define a "session" and thus hard to add capabilities.

> * The hello/hello handshake is confusing, and probably would allow you
>   to point two watchers at each other without them noticing.  Make it
>   hello/ready, or some other unambiguous choice.

OK

> * I took some liberty and tried to fully define the transitions between
>   the sets.  So even though I'm not sure this is currently handled, I
>   made it well-defined to issue "watch" for a path that is in the
>   changed set.

Yes that should avoid races. The path can be removed from "watched"
set later after git acknowledges it.

> * "bye" is confusing, because in practice git issues "forget"s after
>   "bye".  The best I can come up with is "setstate", I'm sure you have
>   better ideas.

Originally it was "forget", "forget", "forget" then "bye". But with
that order, if git crashes before sending "bye" we could lose info in
"changed" set so the order was changed but I did not update the
command name.

> There's also the problem of ordering guarantees between the socket and
> inotify.  I haven't found any, so I would conservatively assume that the
> socket messages may in fact arrive before inotify, which is a race in
> the current code.  E.g., in the sequence 'touch foo; git status' the
> daemon may see
>
>   socket                    inotify
>   < hello...
>   < status
>   > new <empty list>
>                             touch foo
>
> I think a clever way to handle this would be to add a new command:
>
>   Wait::
>     This command serves synchronization.  Git creates a file of its
>     choice in $GIT_DIR/watch (say, `.git/watch/wait.<random>`).  Then it
>     sends "wait <path>".  The watcher MUST block until it has processed
>     all change notifications up to and including <path>.

So wait.<random> inotify event functions as a barrier. Nice.

> As far as inotify corner-cases go, the only one I'm aware of is
> directory renames.  I suspect we'll have to watch directories all the
> way up to the repository root to reliably detect when this happens.  Not
> sure how to best handle this.  Perhaps we should declare Git completely
> agnostic wrt such issues, and behind the scenes issue all watches up to
> the root even if we don't need them for anything other than directory
> renames.

Under normal circumstances we would watch all directories in the
worktree anyway. I'll need to write some tests for inotify..

> Ok, that's probably a confused sum of rambles.  Let me know if you can
> make any sense of it.

Thank you for your input. Now I'm back to the white board (or paper).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html