On 4/26/21 3:49 PM, Derrick Stolee wrote:
On 4/1/2021 11:40 AM, Jeff Hostetler via GitGitGadget wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Teach fsmonitor--daemon to create token-ids and define the
overall token naming scheme.
...
+/*
+ * Requests to and from a FSMonitor Protocol V2 provider use an opaque
+ * "token" as a virtual timestamp. Clients can request a summary of all
+ * created/deleted/modified files relative to a token. In the response,
+ * clients receive a new token for the next (relative) request.
+ *
+ *
+ * Token Format
+ * ============
+ *
+ * The contents of the token are private and provider-specific.
+ *
+ * For the built-in fsmonitor--daemon, we define a token as follows:
+ *
+ * "builtin" ":" <token_id> ":" <sequence_nr>
+ *
+ * The <token_id> is an arbitrary OPAQUE string, such as a GUID,
+ * UUID, or {timestamp,pid}. It is used to group all filesystem
+ * events that happened while the daemon was monitoring (and in-sync
+ * with the filesystem).
+ *
+ * Unlike FSMonitor Protocol V1, it is not defined as a timestamp
+ * and does not define less-than/greater-than relationships.
+ * (There are too many race conditions to rely on file system
+ * event timestamps.)
+ *
+ * The <sequence_nr> is a simple integer incremented for each event
+ * received. When a new <token_id> is created, the <sequence_nr> is
+ * reset to zero.
+ *
+ *
+ * About Token Ids
+ * ===============
+ *
+ * A new token_id is created:
+ *
+ * [1] each time the daemon is started.
+ *
+ * [2] any time that the daemon must re-sync with the filesystem
+ * (such as when the kernel drops or we miss events on a very
+ * active volume).
+ *
+ * [3] in response to a client "flush" command (for dropped event
+ * testing).
+ *
+ * [4] MAYBE We might want to change the token_id after very complex
+ * filesystem operations are performed, such as a directory move
+ * sequence that affects many files within. It might be simpler
+ * to just give up and fake a re-sync (and let the client do a
+ * full scan) than try to enumerate the effects of such a change.
+ *
+ * When a new token_id is created, the daemon is free to discard all
+ * cached filesystem events associated with any previous token_ids.
+ * Events associated with a non-current token_id will never be sent
+ * to a client. A token_id change implicitly means that the daemon
+ * has gap in its event history.
+ *
+ * Therefore, clients that present a token with a stale (non-current)
+ * token_id will always be given a trivial response.
From this comment, it seems to be the case that concurrent Git
commands will race to advance the FS Monitor token and one of them
will lose, causing a full working directory scan. There is no list
of "recent" tokens.
I could see this changing in the future, but for now it is a
reasonable simplification.
The daemon only creates a new token-id when it needs to because of
a loss of sync with the FS. And the sequence-nr is advanced based
upon the quantity of FS activity. Clients don't cause either to
change or advance (except for the flush, which is a testing hack).
Ideally, the token-id is created when the daemon starts up and is
never changed.
Concurrent clients all receive normalized event data from the
in-memory cache/queue from threads reading the queue in parallel.
I included [4] as a possible future enhancement, but so far haven't
actually needed it. The event stream (at least on Windows and MacOS)
from the OS is sufficient that I didn't need to implement that.
I'll remove [4] from the comments.
Thanks,
Jeff