Hi Amir! On Mon 22-01-18 11:18:49, Amir Goldstein wrote: > Change Journal [1] (a.k.a USN) is a popular feature of NTFS v3, used by > backup and indexing applications to monitor changes to a file system > in a reliable, durable and scalable manner. > > Linux is lagging behind Windows w.r.t those capabilities by two decades > and it is not because lack of demand for the feature. I dare to make a > wild guess that there are much more file servers nowadays running on > Linux, then there are file servers running on Windows and the scale of > changes to track only increased over the years. Not only Windows but also MacOS which has FSEvents API [1]. > On LSF/MM 2017, I presented "fanotify super block watch" [2], which > addresses the scalability issues of inotify when tracking changes over > millions of directories. This work is running in production now, but is > not yet ready for upstream submission. Actually I'd be interested in addressing fanotify shortcomings first before adding even more complexity with persistence... Adding directory events in the form 'something has changed' should be straightforward and good enough (this is the granularity of information FSEvents API from MacOS provides as well). Adding some way to overcome namespace issues so that unshare(2) is not enough to hide your changes from mountpoint watches. > This year, I would like to discuss solutions to address the reliability > and durability aspects of Linux filesystem change tracking. > > Some Linux filesystems are already journaling everything (e.g. ubifs), > so providing the Change Journal feature to applications is probably just > a matter of providing an API to retrieve latest USN and enumerate changes > within USN range. > > Some Linux filesystems store USN-like information in metadata, but it is > not exposed to userspace in a standard way that could be used by change > tracking applications. For example, XFS stores LSN (transaction id) in > inodes, so it should be possible to enumerate inodes that were changed > since a last known queried LSN value. > > A more generic approach, for filesystems with no USN-like information, > would be to provide an external change journal facility, much like what > JBD2 does, but not in the block level. This facility could hook as a > consumer of filesystem notifications as an fsnotify backend and provide > record and enumerate capabilities for filesystem operations. > > With the external change journal approach, care would have to be taken to > account for the fact that filesystem changes become persistent later than > the time they are reported to fsnotify, so at least a transaction commit > event (with USN) would need to be reported to fsnotify. Frankly, this is very hard and I'm not sure you can make it both race free and fs agnostic. I actually think it would be enough if we provided guranteed persistence & consistence across clean reboots. In case of crashes we would just need to flag that force rescan-the-world event for users of the API - again this is pretty much what FSEvents does. > The user API to retrieve change journal information should be standard, > whether the change journal is a built in filesystem feature or using the > external change journal. The fanotify API is a good candidate for change > journal API, because it already defines a standard way of reporting > filesystem changes. Naturally, the API would have to be extended to cater > the needs of a change journal API and would require user to explicitly > opt-in for the new API (e.g. FAN_CLASS_CHANGE_JOURNAL). So I actually believe the persistence would be the easiest to handle completely in userspace as a daemon + library to access it. The daemon could use fanotify + database file for storage for filesystems which don't have built in persistent change log and hook into filesystem specific facility where it knows how to... Honza [1] https://developer.apple.com/library/content/documentation/Darwin/Conceptual/FSEvents_ProgGuide/UsingtheFSEventsFramework/UsingtheFSEventsFramework.html -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR