Re: [LSF/MM TOPIC] Filesystem Change Journal API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 25, 2018 at 5:45 PM, Jan Kara <jack@xxxxxxx> wrote:
> Hi Amir!
>
> On Mon 22-01-18 11:18:49, Amir Goldstein wrote:
>> Change Journal [1] (a.k.a USN) is a popular feature of NTFS v3, used by
>> backup and indexing applications to monitor changes to a file system
>> in a reliable, durable and scalable manner.
>>
>> Linux is lagging behind Windows w.r.t those capabilities by two decades
>> and it is not because lack of demand for the feature. I dare to make a
>> wild guess that there are much more file servers nowadays running on
>> Linux, then there are file servers running on Windows and the scale of
>> changes to track only increased over the years.
>
> Not only Windows but also MacOS which has FSEvents API [1].
>
>> On LSF/MM 2017, I presented "fanotify super block watch" [2], which
>> addresses the scalability issues of inotify when tracking changes over
>> millions of directories. This work is running in production now, but is
>> not yet ready for upstream submission.
>
> Actually I'd be interested in addressing fanotify shortcomings first before
> adding even more complexity with persistence... Adding directory events in
> the form 'something has changed' should be straightforward and good enough
> (this is the granularity of information FSEvents API from MacOS provides as
> well). Adding some way to overcome namespace issues so that unshare(2) is
> not enough to hide your changes from mountpoint watches.

That's pretty much a subset of what I already have... just need to find the time
to carve a proper patch set and post it...

>
>> This year, I would like to discuss solutions to address the reliability
>> and durability aspects of Linux filesystem change tracking.
>>
>> Some Linux filesystems are already journaling everything (e.g. ubifs),
>> so providing the Change Journal feature to applications is probably just
>> a matter of providing an API to retrieve latest USN and enumerate changes
>> within USN range.
>>
>> Some Linux filesystems store USN-like information in metadata, but it is
>> not exposed to userspace in a standard way that could be used by change
>> tracking applications. For example, XFS stores LSN (transaction id) in
>> inodes, so it should be possible to enumerate inodes that were changed
>> since a last known queried LSN value.
>>
>> A more generic approach, for filesystems with no USN-like information,
>> would be to provide an external change journal facility, much like what
>> JBD2 does, but not in the block level. This facility could hook as a
>> consumer of filesystem notifications as an fsnotify backend and provide
>> record and enumerate capabilities for filesystem operations.
>>
>> With the external change journal approach, care would have to be taken to
>> account for the fact that filesystem changes become persistent later than
>> the time they are reported to fsnotify, so at least a transaction commit
>> event (with USN) would need to be reported to fsnotify.
>
> Frankly, this is very hard and I'm not sure you can make it both race free
> and fs agnostic. I actually think it would be enough if we provided
> guranteed persistence & consistence across clean reboots. In case of
> crashes we would just need to flag that force rescan-the-world event for
> users of the API - again this is pretty much what FSEvents does.
>

The requirement from my employer that drives the need for persistent change
log in filesystem/kernel is that rescan-the-world takes way too much time.
So rescan-the-world cannot be the answer to persistent change log requirement.
There are just too many files these day and age...

>> The user API to retrieve change journal information should be standard,
>> whether the change journal is a built in filesystem feature or using the
>> external change journal. The fanotify API is a good candidate for change
>> journal API, because it already defines a standard way of reporting
>> filesystem changes. Naturally, the API would have to be extended to cater
>> the needs of a change journal API and would require user to explicitly
>> opt-in for the new API (e.g. FAN_CLASS_CHANGE_JOURNAL).
>
> So I actually believe the persistence would be the easiest to handle
> completely in userspace as a daemon + library to access it. The daemon
> could use fanotify + database file for storage for filesystems which don't
> have built in persistent change log and hook into filesystem specific
> facility where it knows how to...
>

Sure, whatever could be done by userspace is better.
The user of kernel change journal API *is* that change db application, (e.g.
which decides which files need to be synced to the cloud). It just can't afford
to rescan-the-world on non clean shutdown.

I believe that in the absence of an external change journal implementation,
the minimal requirement from filesystem is to provide an inode iterator and
some sort of USN-like property that can be used to filter 'changes since USN'.
This fits well to XFS's bulkstat API and the inode LSN metadata.
XFS is my target filesystem anyway, so I could go a head and use those FS
specific APIs, but would like to start with looking at all other requirements
and what information other filesystems can provide and try to design an API
that could work with several filesystems and at least make a future generic
implementation possible.

Cheers,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux