RE: [PATCH] fsmonitor: option to allow fsmonitor to run against network-mounted repos

Eric DeCosta <edecosta@xxxxxxxxxxxxx> · Fri, 19 Aug 2022 18:38:33 +0000

> -----Original Message-----
> From: Jeff Hostetler <git@xxxxxxxxxxxxxxxxx>
> Sent: Friday, August 19, 2022 12:50 PM
> To: Eric DeCosta via GitGitGadget <gitgitgadget@xxxxxxxxx>;
> git@xxxxxxxxxxxxxxx
> Cc: Eric DeCosta <edecosta@xxxxxxxxxxxxx>
> Subject: Re: [PATCH] fsmonitor: option to allow fsmonitor to run against
> network-mounted repos
> 
> 
> 
> On 8/18/22 4:48 PM, Eric DeCosta via GitGitGadget wrote:
> > From: Eric DeCosta <edecosta@xxxxxxxxxxxxx>
> >
> > Though perhaps not common, there are uses cases where users have large,
> > network-mounted repos. Having the ability to run fsmonitor against
> > network paths would benefit those users.
> >
> > As a first step towards enabling fsmonitor to work against
> > network-mounted repos, a configuration option, 'fsmonitor.allowRemote'
> > was introduced for Windows. Setting this option to true will override
> > the default behavior (erroring-out) when a network-mounted repo is
> > detected by fsmonitor. In order for macOS to have parity with Windows,
> > the same option is now introduced for macOS.
> 
> We might also say that this config option only allows FSMonitor
> to TRY to consider using a network-mounted repo.  And that this
> ability is considered experimental until sufficient testing can
> be completed and we can determine the combinations of
> { client os } x { server os } x { remote access } x { file system type }
> that are known to work or not work and we can update the defaults
> and the documentation accordingly.
> 
Yes, very experimental. 

> For example, on a MacOS client, we expect the local "fseventsd" service
> to send us recursive events on all files and sub directories under the
> repo root.  If the server is a Linux machine (which doesn't really do
> recursive events), does exporting the FS from the server over NFS or SMB
> (or whatever) cause the Linux host to send enough information to the
> client machine for fseventsd to synthesize the recursive event stream
> locally that FSMonitor expects.  It might.  It might not.  That
> combination should be tested (along with a lot of other combinations).
> 
> But again, this patch is just about allowing the (informed?) user to
> try it and begin testing various combinations.
> 
Yes, the point is to allow users to try it out.  Self-servingly, I have about
3K users who make heavy use of network-mounted sandboxes on the
three major platforms; all connecting via NFS or SMB to Linux file
servers.  Hardly exhaustive, but the file system change notification APIs
(inotify, FSEvents, and ReadDirectoryCHangesW) all seem to work correctly.
Thus my motivation to work on this aspect of git :-)

> 
> >
> > The the added wrinkle being that the Unix domain socket (UDS) file
> > used for IPC cannot be created in a network location; instead the
> > temporary directory is used.
> 
> This scares me a bit.  I put the socket in the .git directory
> so that we are guaranteed that only one daemon will run on the
> repository and that all clients will know where to find that socket
> (if it exists).
> 
> It looks like you're creating the UDS using a tmp pathname and
> writing the pathname to the actual .git/fsmonitor--daemon.ipc FILE.
> This adds a layer of indirection and is prone to races.
> 
Good point.

> 
> The act of creating the actual socket is protected by code in
> unix-socket.c and unix-stream-server.c to try to safely create
> the socket and avoid stepping on another active daemon (who
> currently has the server-side of the socket open).
> 
> My code also detects dead sockets (where a previous daemon died
> and failed to delete the socket).
> 
> 
> Additionally, allowing remote access means that the repo could
> be used by multiple client machines and/or by the server machine
> itself.  Consider the example of two MacOS clients mounting the
> remote repo and both wanting to start FSMonitor.  They would
> constantly fight to recreate a new local-tmp-based socket and
> update your pathname FILE and end up invalidating each other on
> each command.
> 
I see your point - they'd stomp on each other. 

As far as multiple client machines mounting the remote repo, I have
doubts that FSMonitor would even see changes made from another
machine. Worth trying out and documenting as needed - might even
be better off being considered as unsupported.

> 
> Also, if someone overwrites your new pathname FILE, but doesn't tell
> the daemon, the daemon will be orphaned -- still running, but no one
> will ever connect to it because the FILE no longer points to it.
> 
True, thanks for pointing that out.

> 
> There was a suggestion later in this thread about using a SHA-1
> or SHA-256 hash of the pathname to avoid the tmp XXXXXX pattern
> and just put the socket in $HOME (and omit the need for the new
> fsmonitor-daemon.ipc FILE completely).  This might work, but we
> need to be careful because a user might have hardlinks or symlinks
> locally so there may be more than one unique path to the repo
> on the local system.  (It is OK to have more than one daemon
> listening to a repo, just less efficient.)
> 
Ah, I see.

> 
> As an interim step, you might try using my original socket code
> plus just the config.allowRemote=true change.  And test it on a
> mounted repo where you've converted the .git directory to a .git
> file and moved contents of the .git directory to somewhere local.
> Then the UDS would be created in the local GITDIR instead of on
> the remote system.  This won't help any of the sharing cases I
> described above, but will let you experiment with getting remote
> events.
> 
Within the context of the environment that I have available to me
(macOS over NFS to a Linux file server), FSEvents is working correctly.
I can make changes at any arbitrary place inside of the repo and an
event is generated. 

It's looking like that the Unix domain socket (UDS) file should remain
where it is unless fsmonitor.allowRemote is true.

If fsmonitor.allowRemote is true then the UDS file can be located
in $HOME with the caveat that if there is more than one path to
the repo (via hard or sym links) that things might not work as
expected.  I think that's OK given the experimental nature of the
feature.

-Eric

> Jeff
> 
> 
> 
> > base-commit: 9bf691b78cf906751e65d65ba0c6ffdcd9a5a12c
> >