Re: [PATCH v9 00/30] Builtin FSMonitor Part 2

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Mon, 28 Mar 2022 12:27:54 -0400

On 3/25/22 8:48 PM, Ævar Arnfjörð Bjarmason wrote:

On Fri, Mar 25 2022, Jeff Hostetler wrote:

On 3/25/22 3:02 PM, rsbecker@xxxxxxxxxxxxx wrote:
On March 25, 2022 2:03 PM, Jeff Hostetler wrote:
[...]
[...]

Wouldn't it be much simpler POC in this case to write "watchman
backend"?  Then we'd both get a Linux backend, and an alternate backend
for the other platforms to validate their implementation.

Some past references to that:
https://lore.kernel.org/git/871r8c73ej.fsf@xxxxxxxxxxxxxxxxxxx/ &
https://lore.kernel.org/git/87h7lgfchm.fsf@xxxxxxxxxxxxxxxxxxx/

Yes, there are several ways for a client command, such as anyone
who calls read_index/refresh_index, to get FS change data from a
monitoring service.

Let's go thru the options here for the sake of conversation:

(option 1): Use the hook-like mechanism that Ben built in 2017
            to talk to an interlude program, shell script, perl
            script, etc.  That "script" itself then talks to a
            long-running service/daemon, such as Watchman, to get
            the list of changes and relays them back to the client.

    * This "proxy" has to handle protocol format conversions.
    * It may also have to start the service on new repos.
    * And depends upon a third-party service being installed.
    * We are limited to supporting platforms where the third-party
      tool is supported.

(option 2): Replace the hook with builtin client code to talk
            directly to the service and bypass the need for
            the proxy script/executable.

    * Git client code would need client-side IPC to talk to
      an established and running service.  (Similar to the client
      side of Simple-IPC but probably not pkt-line based.)
    * Git client code would now need to handle any protocol
      format conversions.
    * Git client code might also have to start the service.
    * And we'd still be dependent on a third-party service being
      installed.
    * And we are still limited to supporting platforms where
      the third-party tool is supported.
    * So far we've been assuming that that third-party tool is
      "Watchman", but technically, you could have other such
      services available.
      * So you may need multiple implementations of option 2,
        one for each third-party tool.
      * I'm not saying that this is hard, but just yet another
        detail that would have to be encoded in the Git source
        to get this "free" feature.

(option 3): Git implements a daemon to monitor the file system
            directly.

    * Git owns the protocol between client and service.
    * Git owns the backend, so no third-party tools required.
    * Git owns service startup.
    * Unfortunately, we are also responsible for building the
      backends on each platform we want to support.

    * In the future, we could augment the service to be more
      "Git-aware", such as discarding data for ignored files,
      but that is just speculation at this point.

Now, with that context in place:

[1] Nothing prohibits us from having all three options be available
    on a platform.  They should all be able to coexist.

[2] One of my stated goals was to reduce the dependency on
    third-party tools -- especially on platforms that don't have
    a simple package management system.  The point here was to
    make it easier for enterprises to deploy Git to 1k's or 10k's
    of users (and possibly unattended build machines) and make use
    of the feature without *also* having to deploy and track updates
    to yet-another third-party tool or otherwise complicate their ES
    deployment setups.  Only option 3 gets rid of the third-party
    tool requirement.

[3] Option 2 is a valuable suggestion, don't get me wrong.  It can/
    will/should improve performance over option 1 by eliminating an
    extra process creation and the overhead of pumping all of that
    data thru another socket-pair/process and all of the context
    switches that that requires.

[4] Option 2 and option 3 could/should perform relatively equally.
    And if we wanted to deprecate the hook-like interface, doing
    an option 2 implementation would allow us to transition the
    platforms for which I don't currently have a backend.

[5] However, option 2 does not eliminate the need for a third-party
    tool, so it is of limited interest to me at this time.  Yes, it
    would be nice to have it for testing and perf testing purposes
    and comparisons with option 3, but if I have to budget my time,
    I would rather spend my efforts on additional backends.

    I consider the question of doing option 2 and a Linux backend
    as two completely independent topics -- topics that we can
    discuss and/or pursue in parallel if there is interest.

[6] Randall's question was about doing option 3 and I hope that I
    provided helpful information should he or anyone else want to
    pick up that effort before I can.

[7] If you want to start a parallel conversation on option 2, let's
    do that in a new top-level email thread.

Cheers,
Jeff