On 12/10/19 12:49, Amir Goldstein wrote: > [cc: Watchman maintainer] Hi, I'm the Watchman creator and maintainer, and I also work on a FUSE based virtual filesystem called EdenFS that works with the source control systems that we use at Facebook. I don't have much context on fanotify yet, but I do have a lot of practical experience with Watchman on various operating systems with very large recursive directory trees. Amir asked me to participate in this discussion, and I think it's probably helpful to give a little bit of context on how we deal with some of the different watcher interfaces, and also how we see the consumers of Watchman making use of this sort of data. There are tens of watchman consuming applications in common use inside FB, and a long tail of ad-hoc consumers that are not on my radar. I don't want to flood you with data that may not feel relevant so I'm going to try to summarize some key points; I'd be happy to elaborate if you'd like more context! These are written out as numbered statements to make it easier to reference in further discussion, and are not intended to be taken as any kind of prescriptive manifesto! 1. Humans think in terms of filenames. Applications largely only care about filenames. It's rare (it came up as a hypothetical for only one integrating application at FB in the past several years) that they care about optimizing for the various rename cases so long as they get notified that the old name is no longer visible in the filesystem and that a new name is now visible elsewhere in the portion of the filesystem that they are watching. 2. Application authors don't want to deal with the complexities of file watching, they just want to reliably know if/when a named file has changed. Rename cookies and overflow events are too difficult for most applications to deal with at all/correctly. 3. Overflow events are painful to deal with. In Watchman we deal with inotify overflow by re-crawling and examining the directory structure to re-synchronize with the filesystem state. For very large trees this can take a very long time. 4. Partially related to 3., restarting the watchman server is an expensive event because we have to re-crawl everything to re-create the directory watches with inotify. If the system provided a recursive watch function and some kind of a change journal that told watchman a set of N directories to crawl (where N < the M overall number of directories) and we had a stable identifier for files, then we could persist state across restarts and cheaply re-synchronize. 5. Is also related to 3. and 4. We use btrfs subvolumes in our CI to snapshot large repos and make them available to jobs running in different containers potentially on different hosts. If the journal mechanism from 4. were available in this situation it would make it super cheap to bring up watchman in those environments. 6. A downside to recursive watches on macOS is that fseventsd has very limited ability to add exceptions. A common pattern at FB is that the buck build system maintains a build artifacts directory called `buck-out` in the repo. On Linux we can ignore change notifications for this directory with zero cost by simply not registering it with inotify. On macOS, the kernel interface allows for a maximum of 8 exclusions. The rest of the changes are delivered to fseventsd which stats and records everything in a sqlite database. This is a performance hotspot for us because the number of excluded directories in a repo exceeds 8, and the uninteresting bulky build artifact writes then need to transit fseventsd and into watchman before we can decide to ignore them. 7. Windows has a journal mechanism that could potentially be used as suggested in 4. above, but it requires privileged access. I happen to know from someone at MS that worked on a similar system that there is also a way to access a subset of this data that doesn't require privileged access, but that isn't documented. I mention this because elsewhere in this thread is a discussion about privileged access to similar sounding information. 8. Related to 6. and 7., if there is a privileged system daemon to act as the interface between userspace<->kernel, care needs to be taken to avoid the sort of performance hotspot we see on macOS with 6. above. OK, hopefully that doesn't feel too off the mark! I don't think everything above needs to be handled directly at the kernel interface. Some of these details could be handled on the userspace side, either by a daemon (eg: watchman) or a suitably well designed client library (although that can make it difficult to consume in some programming environments). --Wez