Re: [PATCH 3/4] fsnotify: assert that file_start_write() is not held in permission hooks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 11, 2023 at 12:30 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Fri 08-12-23 23:02:35, Amir Goldstein wrote:
> > On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Thu 07-12-23 14:38:24, Amir Goldstein wrote:
> > > > filesystem may be modified in the context of fanotify permission events
> > > > (e.g. by HSM service), so assert that sb freeze protection is not held.
> > > >
> > > > If the assertion fails, then the following deadlock would be possible:
> > > >
> > > > CPU0                          CPU1                    CPU2
> > > > -------------------------------------------------------------------------
> > > > file_start_write()#0
> > > > ...
> > > >   fsnotify_perm()
> > > >     fanotify_get_response() =>        (read event and fill file)
> > > >                               ...
> > > >                               ...                     freeze_super()
> > > >                               ...                       sb_wait_write()
> > > >                               ...
> > > >                               vfs_write()
> > > >                                 file_start_write()#1
> > > >
> > > > This example demonstrates a use case of an hierarchical storage management
> > > > (HSM) service that uses fanotify permission events to fill the content of
> > > > a file before access, while a 3rd process starts fsfreeze.
> > > >
> > > > This creates a circular dependeny:
> > > >   file_start_write()#0 => fanotify_get_response =>
> > > >     file_start_write()#1 =>
> > > >       sb_wait_write() =>
> > > >         file_end_write()#0
> > > >
> > > > Where file_end_write()#0 can never be called and none of the threads can
> > > > make progress.
> > > >
> > > > The assertion is checked for both MAY_READ and MAY_WRITE permission
> > > > hooks in preparation for a pre-modify permission event.
> > > >
> > > > The assertion is not checked for an open permission event, because
> > > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
> > > > safe to write to filesystem in the content of an open permission event.
> > >                                      ^^^^^ context
> > >
> > > BTW, isn't this a bit inconvenient? I mean filling file contents on open
> > > looks quite natural... Do you plan to fill files only on individual read /
> > > write events? I was under the impression simple HSM handlers would be doing
> > > it on open.
> > >
> >
> > Naive HSMs perhaps... The problem with filling on open is that the pattern
> > open();fstat();close() is quite common with applications and we found open()
> > to be a sub-optimal predicate for near future read().
> >
> > Filling the file on first read() access or directory on first
> > readdir() access does
> > a better job in "swapping in" the correct files.
> > A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event.
> > that's not any more or less simple than filling it on an OPEN_PERM event.
> >
> > Another point that could get lost when reading to above deadlock is that
> > filling the file content before open(O_TRUNC) would be really dumb,
> > because swap in is costly and you are going to throw away the data.
> > If we really wanted to provide HSM with a safe way to fill files on open,
> > we would probably need to report the open flags with the open event.
> > I actually think that reporting the open flags would be nice even with
> > an async open event.
>
> OK, thanks for explanation!

Anyway, I will try to find a solution also for the OPEN_PERM deadlock.
I have some sketches, but ->atomic_open() complicates things...

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux