Re: inotify on mmap writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Wed, Mar 22, 2023 at 2:12 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Wed, Mar 22, 2023 at 9:43 PM Amol Dixit <amoldd@xxxxxxxxx> wrote:
> > >
> > > On Wed, Mar 22, 2023 at 12:16 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Mar 22, 2023 at 4:13 AM Amol Dixit <amoldd@xxxxxxxxx> wrote:
> > > > >
> > > > > Hello,
> > > > > Apologies if this has been discussed or clarified in the past.
> > > > >
> > > > > The lack of file modification notification events (inotify, fanotify)
> > > > > for mmap() regions is a big hole to anybody watching file changes from
> > > > > userspace. I can imagine atleast 2 reasons why that support may be
> > > > > lacking, perhaps there are more:
> > > > >
> > > > > 1. mmap() writeback is async (unless msync/fsync triggered) driven by
> > > > > file IO and page cache writeback mechanims, unlike write system calls
> > > > > that get funneled via the vfs layer, whih is a convenient common place
> > > > > to issue notifications. Now mm code would have to find a common ground
> > > > > with filesystem/vfs, which is messy.
> > > > >
> > > > > 2. writepages, being an address-space op is treated by each file
> > > > > system independently. If mm did not want to get involved, onus would
> > > > > be on each filesystem to make their .writepages handlers notification
> > > > > aware. This is probably also considered not worth the trouble.
> > > > >
> > > > > So my question is, notwithstanding minor hurdles (like lost events,
> > > > > hardlinks etc.), would the community like to extend inotify support
> > > > > for mmap'ed writes to files? Under configs options, would a fix on a
> > > > > per filesystem basis be an acceptable solution (I can start with say
> > > > > ext4 writepages linking back to inode/dentry and firing a
> > > > > notification)?
> > > > >
> > > > > Eventually we will have larger support across the board and
> > > > > inotify/fanotify can be a reliable tracking mechanism for
> > > > > modifications to files.
> > > > >
> > > >
> > > > What is the use case?
> > > > Would it be sufficient if you had an OPEN_WRITE event?
> > > > or if OPEN event had the O_ flags as an extra info to the event?
> > > > I have a patch for the above and I personally find this information
> > > > missing from OPEN events.
> > > >
> > > > Are you trying to monitor mmap() calls? write to an mmaped area?
> > > > because writepages() will get you neither of these.
> > >
> > > OPEN events are not useful to track file modifications in real time,
> > > although I can do see the usefulness of OPEN_WRITE events to track
> > > files that can change.
> > >
> > > I am trying to track writes to mmaped area (as these are not notified
> > > using inotify events). I wanted to ask the community of the
> > > feasibility and usefulness of this. I had some design ideas of
> > > tracking writes (using jbd commit callbacks for instance) in the
> > > kernel, but to make it generic sprucing up the inotify interface is a
> > > much better approach.
> > >
> > > Hope that provides some context.
> >
> > Not enough.
> >
> > For a given file mmaped writable by a process that is writing
> > to that mapped memory all the time for a long time.
> >
> > What do you expect to happen?
> > How many events?
> > On first time write to a page? To the memory region?
> > When dirty memory is written back to disk?
> >
> > You have mixed a lot of different things in your question.
> > You need to be more specific about what the purpose
> > of this monitoring is.
> >
> > From all of the above, only MODIFY on mmap() call
> > seems reasonable to me and MODIFY on first write to
> > an mmaped area is something that we can consider if
> > there is very good justification.
> >
> > FYI, the existing MODIFY events are from after the
> > write system call modified the page cache and there is
> > no guarantee about when writeback to disk is done.
> >


On Thu, Mar 23, 2023 at 12:13 AM Amol Dixit <amoldd@xxxxxxxxx> wrote:
>
> Thank you Amir for taking the time. I will take another stab at the motivation.

Please do not "top post" on fsdevel discussions.

>
> Say I am writing an efficient real time file backup application, and
> monitoring changes to certain files. The best rsync can do is to chunk
> and checksum and gather delta regions to transfer. What if, through
> inotify, the application is alerted of precise extents written to a
> certain file. This would take the form of <logical file offset,
> length> tuples in the metadata attached with each MODIFY event. That
> should be easily possible (just like we add file names to CREATE
> events). For mmaped regions 'length' would be in page granularity
> since the kernel wouldn't know precise regions written within a given
> page.
>
> > What do you expect to happen?
> Notifications can be collapsed until they are read. So if first IO is
> <0, 20> and second IO is <20, 20>, then the event can be collapsed
> in-place to read <0, 40>. If they are not contiguous, say second IO is

That can be done.
I already have patches for FAN_EVENT_INFO_TYPE_RANGE.

> <30, 10>, then we will have 2 extent entries in the metadata of MODIFY
> event - <0, 20> and <30, 10>, and so on.
>

That seems like an overkill.
More than a single extent could just drop the granular range info.

> > How many events?
> Events are always opportunistic. If too many events of the same kind,
> a generic "Too many changes" event is enough (CIFS change notification
> has something similar) to alert the reader.
>
> > On first time write to a page?
> Doesn't help ongoing activity tracking.
>
> > To the memory region?
> Precision as much as possible for offsets and lengths is nice to have.
>
> > When dirty memory is written back to disk?
> Events are more like hints (as you said they do not guarantee
> writeback to disk anyway). Applications will do their own integrity
> checks on top of these hints.
>

Hints, yes, but event do need to guarantee that a change is
not missed, so in the context of mmaped memory writes that
means that after the event is consumed by application or after
the application reads the file content, PTE may need to be setup to
trigger a new event on the next write.

Doing that on page level seems like an unacceptable overkill
for the use case of backup applications.

Perhaps a more feasible option is to generate an event when
an inode or mapping change state into "dirty pages", then backup
application needs to do:

1. consume pending MODIFY events on file
2. call fsdatasync()/msync()/sync_file_range()
3. read content of file to backup

And then we should be able to provide a guarantee
that if there is any write after #2 returned success,
a new MODIFY event will be generated.

We should probably make this a new event (e.g. FAN_WRITE)
because it has different semantics than FAN_MODIFY and it can also
be useful to non-mmapped writes use case.

None of this is going to be simple though, so to answer your
original questions:

> So my question is, notwithstanding minor hurdles (like lost events,
> hardlinks etc.), would the community like to extend inotify support
> for mmap'ed writes to files?

If you are willing to do the work and you can prove that it does not
hurt performance of any existing workload when the new feature
is not in use, I think it would be a nice improvement.

> Under configs options,

No config options please.
If you cannot make it work without hurting performance, no go.

> would a fix on a per filesystem basis be an acceptable solution
> (I can start with say ext4 writepages linking back to inode/dentry
> and firing a notification)?

Solution should be generic in vfs.
It is possible that this will not be supported for all filesystems,
but only on some filesystems that implement some vfs operation
or opt-in with some fs flag, but not a fs specific implementation.

Thanks,
Amir.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux