Re: [PATCH 11/23] fsmonitor--daemon: define token-ids

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Fri, 30 Apr 2021 12:17:16 -0400

On 4/26/21 3:49 PM, Derrick Stolee wrote:
On 4/1/2021 11:40 AM, Jeff Hostetler via GitGitGadget wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Teach fsmonitor--daemon to create token-ids and define the
overall token naming scheme.
...
+/*
+ * Requests to and from a FSMonitor Protocol V2 provider use an opaque
+ * "token" as a virtual timestamp.  Clients can request a summary of all
+ * created/deleted/modified files relative to a token.  In the response,
+ * clients receive a new token for the next (relative) request.
+ *
+ *
+ * Token Format
+ * ============
+ *
+ * The contents of the token are private and provider-specific.
+ *
+ * For the built-in fsmonitor--daemon, we define a token as follows:
+ *
+ *     "builtin" ":" <token_id> ":" <sequence_nr>
+ *
+ * The <token_id> is an arbitrary OPAQUE string, such as a GUID,
+ * UUID, or {timestamp,pid}.  It is used to group all filesystem
+ * events that happened while the daemon was monitoring (and in-sync
+ * with the filesystem).
+ *
+ *     Unlike FSMonitor Protocol V1, it is not defined as a timestamp
+ *     and does not define less-than/greater-than relationships.
+ *     (There are too many race conditions to rely on file system
+ *     event timestamps.)
+ *
+ * The <sequence_nr> is a simple integer incremented for each event
+ * received.  When a new <token_id> is created, the <sequence_nr> is
+ * reset to zero.
+ *
+ *
+ * About Token Ids
+ * ===============
+ *
+ * A new token_id is created:
+ *
+ * [1] each time the daemon is started.
+ *
+ * [2] any time that the daemon must re-sync with the filesystem
+ *     (such as when the kernel drops or we miss events on a very
+ *     active volume).
+ *
+ * [3] in response to a client "flush" command (for dropped event
+ *     testing).
+ *
+ * [4] MAYBE We might want to change the token_id after very complex
+ *     filesystem operations are performed, such as a directory move
+ *     sequence that affects many files within.  It might be simpler
+ *     to just give up and fake a re-sync (and let the client do a
+ *     full scan) than try to enumerate the effects of such a change.
+ *
+ * When a new token_id is created, the daemon is free to discard all
+ * cached filesystem events associated with any previous token_ids.
+ * Events associated with a non-current token_id will never be sent
+ * to a client.  A token_id change implicitly means that the daemon
+ * has gap in its event history.
+ *
+ * Therefore, clients that present a token with a stale (non-current)
+ * token_id will always be given a trivial response.

 From this comment, it seems to be the case that concurrent Git
commands will race to advance the FS Monitor token and one of them
will lose, causing a full working directory scan. There is no list
of "recent" tokens.

I could see this changing in the future, but for now it is a
reasonable simplification.

The daemon only creates a new token-id when it needs to because of
a loss of sync with the FS.  And the sequence-nr is advanced based
upon the quantity of FS activity.  Clients don't cause either to
change or advance (except for the flush, which is a testing hack).

Ideally, the token-id is created when the daemon starts up and is
never changed.

Concurrent clients all receive normalized event data from the
in-memory cache/queue from threads reading the queue in parallel.

I included [4] as a possible future enhancement, but so far haven't
actually needed it.  The event stream (at least on Windows and MacOS)
from the OS is sufficient that I didn't need to implement that.

I'll remove [4] from the comments.

Thanks,
Jeff