On 7/1/21 7:17 PM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Teach fsmonitor--daemon client threads to create a cookie file
inside the .git directory and then wait until FS events for the
cookie are observed by the FS listener thread.
This helps address the racy nature of file system events by
blocking the client response until the kernel has drained any
event backlog.
This is especially important on MacOS where kernel events are
only issued with a limited frequency. See the `latency` argument
of `FSeventStreamCreate()`. The kernel only signals every `latency`
seconds, but does not guarantee that the kernel queue is completely
drained, so we may have to wait more than one interval. If we
increase the frequency, the system is more likely to drop events.
We avoid these issues by having each client thread create a unique
cookie file and then wait until it is seen in the event stream.
Is this a guaranteed property of any API fsmonitor might need to work
with (linux, darwin, Windows) that if I perform a bunch of FS operations
on my working tree, that if I finish up by touching this cookie file
that that'll happen last?
I'd think that wouldn't be the case, i.e. on POSIX filesystems unless
you run around fsyncing both files and directories you're not guaranteed
that they're on disk, and even then the kernel might decide to sync your
cookie earlier, won't it?
E.g. on Linux you can even have cross-FS watches, and mix & match
different FS types. I'd expect to get events in whatever
implementation-defined order the VFS layer + FS decided to sync them to
disk in & get to firing off an event for me.
Or do these APIs all guarantee that a linear view of the world is
presented to the API consumer?
Theoretically, none of these APIs guarantee a complete linear ordering.
We receive events from the FS in the order that the FS decides to
perform the actual IO. And the inner workings of the FS is private.
Even if we directly read the journal rather than listening for
notifications, we probably still don't know whether the FS reordered
the queue of things heading to disk.
However in practice, the events for the cookie files do tend to arrive
in order. And the net effect is that the worker thread in the daemon
is sync'd up with IO activity that was initiated before the request.
BTW Watchman also uses cookie files for this same reason.
It should also be noted that some operations are just racy. If you're
doing a bunch of IO in one window and a 'git status' in another window,
your result will be racy -- status (without FSM) makes 2 passes on the
disk: the first to verify mtimes on items in the index and the second
to look for untracked files. the status result may be "blurry" (for
lack of a better word). So the same questions
"does the FS reorder my IO?",
"did status see the fully sync'd FS?",
and etc can also be asked in the normal (non FSM) case, right?
So it may be the case that having an fsmonitor (mine, Watchman, etc)
and the untracked-cache, we'll have less skew in status results
because the status process shouldn't have to do any scanning.
But I'm not sure I want to make that assertion yet.
Thanks,
Jeff