On 7/1/21 5:26 PM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Jul 01 2021, Jeff Hostetler wrote:
On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
Here is V3 of my patch series to add a builtin FSMonitor daemon to Git. I
rebased this series onto v2.32.0.
V3 addresses most of the previous review comments and things we've learned
from our experimental testing of V2. (A version of V2 was shipped as an
experimental feature in the v2.32.0-based releases of Git for Windows and
VFS for Git.)
There are still a few items that I need to address, but that list is getting
very short.
...
fsmonitor-fs-listen-win32: stub in backend for Windows
fsmonitor-fs-listen-macos: stub in backend for MacOS
I left some light comments on the repo-settings.c part of this to
follow
up from a previous round.
Thanks.
Any other testing of it is stalled by there being no linux backend
for
it as part of this series. I see from spelunking repos that Johannes had
a WIP compat/fsmonitor/linux.c which looks like it could/should mostly
work, but the API names all changed since then, and after a short try I
gave up on trying to rebase it.
The early Linux version was dropped because inotify does not give
recursive coverage -- only the requested directory. Using inotify
requires adding a watch to each subdirectory (recursively) in the
worktree. There's a system limit on the maximum number of watched
directories (defaults to 8K IIRC) and that limit is system-wide.
Since the whole point was to support large very large repos, using
inotify was a non-starter, so I removed the Linux version from our
patch series. For example, the first repo I tried it on (outside
of the test suite) had 25K subdirectories.
I'm told there is a new fanotify API in recent Linux kernels that
is a better fit for what we need, but we haven't had time to investigate
it yet.
That default limit is a bit annoying, but I don't see how it's a blocker
in any way.
You simply adjust the limit. E.g. I deployed and tested the hook version
of inotify (using watchman) in a sizable development environment, and
written my own code using the API. This was all before fanotify(7)
existed. IIRC I set most of the limits to 2^24 or 2^20. I've used it
with really large Git repos, including with David Turner's
2015-04-03-1M-git for testing (`git ls-files | wc -l` on that is around
a quarter-million).
If you have a humongous repository and don't have root on your own
machine you're out of luck. But I think most people who'd use this are
either using their own laptop, or e.g. in a corporate setting where
administrators would tweak the sysctl limits given the performance
advantages (as I did).
Once you adjust the limits Linux deals with large trees just fine, it
just has overly conservative limits for most things in sysctl. The API
is a bit annoying, your event loop needs to run around and add watches.
AFAICT you need Linux 5.1 for fanotify(7) to be useful, e.g. Debian
stable, RHEL etc. aren't using anything that new. So having an inotify
backend as well as possibly a fanotify one would be very useful.
And linux.git's docs suggest that the default limit was bumped from 8192
to 1M in v5.10, a really recent kernel, so if you've got that you've
also got fanotify.
In any case, even if Linux's inotify limit was something hardcoded and
impossible to change you could still use such an API to test & debug the
basics of this feature on that platform.
Good points. If the inotify limits can be increased into the millions
then we can revisit supporting it. I do worry about possible race
conditions as we have to scan the worktree and add/delete a watch for
each directory, but we don't need to worry about that right now.
I do want to have Linux support eventually, but I was saving it for
a later effort (and/or looking for volunteers). My IPC-based builtin
daemon complements the existing hook-based fsmonitor support that Ben
Peart and Kevin Willford added a few years ago. That model (and PERL
hook script and Watchman integration) work fine for Linux, so the
advantages of a builtin monitor aren't as compelling.
For example, on Linux, hook process creation is fast, PERL is fast,
and it is easy to just apt-get a tool like Watchman. But on
Windows, process creation is slow, NTFS is slow, PERL is available
as part of the Git-for-Windows distribution, and installing third-party
tools like Watchman onto a collection of enterprise users' machines is
a chore, so it made sense of us to pick platforms where the existing
hook model has issues and add other platforms later.
Besides, this patch series is already at 34 commits. And some of
them are quite large. Adding another platform would just make it
even larger.
Right now I'm more interested in the larger question of whether
we WANT to have a builtin fsmonitor and do we like the overall
design of what I have here? Picking up a new platform, whether
it is Linux, or AIX, or Solaris, or Nonstop, or whatever, should
nicely fit into one platform-specific file in compat/fsmonitor
and not take that long.
I'd really prefer for git not to have features that place free
platforms
at a disadvantage against proprietary platforms if it can be avoided,
and in this case the lack of a Linux backend also means much less
widespread testing of the feature among the development community / CI.
This feature is always going to have platform-specific components,
so the lack of one platform or another should not stop us from
discussing it for the platforms that can be supported.
(I think per the above that's s/can be/are/)
And given the size and complexity of the platform-specific code,
we should not assume that "just test it on Linux" is sufficient.
Yes, there are some common/shared routines/data structures in the
daemon, but hard/tricky parts are in the platform layer.
I think we're talking past each other a bit here. I'm not saying that
you can get full or meaningful testing for it on Windows if you test it
on Linux, or the other way around.
Of course there's platform-specific stuff, although there's also a lot
of non-platform-specific stuff, so even a very basic implementation of
inotify would make reviwing this easier / give access to more reviewers.
I'm saying that I prefer that Git as a free software project not be in
the situation of saying the best use-case for a given size/shape of repo
is to use Git in combination with proprietary operating systems over
freely licensed ones.
I wouldn't worry about that. Even without Watchman integration,
Linux runs things so much faster than anything else it's not funny.
If anything, we need things like fsmonitor on Windows to help keep
up with Linux.
IOW what the FSF has a policy for GNU projects. Now, I think the FSF
probably goes too far in that (famously removing rather obscure font
rendering features from Emacs on OSX), but "manage lots of data" is a
core feature of git.
Jeff