On Thu 25-01-18 18:26:21, Amir Goldstein wrote: > On Thu, Jan 25, 2018 at 5:45 PM, Jan Kara <jack@xxxxxxx> wrote: > >> A more generic approach, for filesystems with no USN-like information, > >> would be to provide an external change journal facility, much like what > >> JBD2 does, but not in the block level. This facility could hook as a > >> consumer of filesystem notifications as an fsnotify backend and provide > >> record and enumerate capabilities for filesystem operations. > >> > >> With the external change journal approach, care would have to be taken to > >> account for the fact that filesystem changes become persistent later than > >> the time they are reported to fsnotify, so at least a transaction commit > >> event (with USN) would need to be reported to fsnotify. > > > > Frankly, this is very hard and I'm not sure you can make it both race free > > and fs agnostic. I actually think it would be enough if we provided > > guranteed persistence & consistence across clean reboots. In case of > > crashes we would just need to flag that force rescan-the-world event for > > users of the API - again this is pretty much what FSEvents does. > > > > The requirement from my employer that drives the need for persistent change > log in filesystem/kernel is that rescan-the-world takes way too much time. > So rescan-the-world cannot be the answer to persistent change log requirement. > There are just too many files these day and age... I can understand that but then you are basically bound to solutions that tie directly into filesystem's consistency tracking machinery (be it journalling, COW-like methods, or anything else). I.e., you have to implement the change journal independently for each filesystem. And also live with the fact that some filesystems will never support this because they cannot achieve such consistency guarantees. > >> The user API to retrieve change journal information should be standard, > >> whether the change journal is a built in filesystem feature or using the > >> external change journal. The fanotify API is a good candidate for change > >> journal API, because it already defines a standard way of reporting > >> filesystem changes. Naturally, the API would have to be extended to cater > >> the needs of a change journal API and would require user to explicitly > >> opt-in for the new API (e.g. FAN_CLASS_CHANGE_JOURNAL). > > > > So I actually believe the persistence would be the easiest to handle > > completely in userspace as a daemon + library to access it. The daemon > > could use fanotify + database file for storage for filesystems which don't > > have built in persistent change log and hook into filesystem specific > > facility where it knows how to... > > > > Sure, whatever could be done by userspace is better. The user of kernel > change journal API *is* that change db application, (e.g. which decides > which files need to be synced to the cloud). It just can't afford to > rescan-the-world on non clean shutdown. > > I believe that in the absence of an external change journal implementation, > the minimal requirement from filesystem is to provide an inode iterator and > some sort of USN-like property that can be used to filter 'changes since USN'. Well, for the sizes of filesystems you speak about here, is really a bulkstat of the whole filesystem viable? I know it is way faster than scanning through directory hierarchy but still... > This fits well to XFS's bulkstat API and the inode LSN metadata. > XFS is my target filesystem anyway, so I could go a head and use those FS > specific APIs, but would like to start with looking at all other requirements > and what information other filesystems can provide and try to design an API > that could work with several filesystems and at least make a future generic > implementation possible. Do you really need LSN in the above scheme? Would not mtime + i_version be enough for your purposes? That should be much easier to get among filesystems... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR