On Mon, Dec 11, 2023 at 12:30 PM Jan Kara <jack@xxxxxxx> wrote: > > On Fri 08-12-23 23:02:35, Amir Goldstein wrote: > > On Fri, Dec 8, 2023 at 8:46 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > On Thu 07-12-23 14:38:24, Amir Goldstein wrote: > > > > filesystem may be modified in the context of fanotify permission events > > > > (e.g. by HSM service), so assert that sb freeze protection is not held. > > > > > > > > If the assertion fails, then the following deadlock would be possible: > > > > > > > > CPU0 CPU1 CPU2 > > > > ------------------------------------------------------------------------- > > > > file_start_write()#0 > > > > ... > > > > fsnotify_perm() > > > > fanotify_get_response() => (read event and fill file) > > > > ... > > > > ... freeze_super() > > > > ... sb_wait_write() > > > > ... > > > > vfs_write() > > > > file_start_write()#1 > > > > > > > > This example demonstrates a use case of an hierarchical storage management > > > > (HSM) service that uses fanotify permission events to fill the content of > > > > a file before access, while a 3rd process starts fsfreeze. > > > > > > > > This creates a circular dependeny: > > > > file_start_write()#0 => fanotify_get_response => > > > > file_start_write()#1 => > > > > sb_wait_write() => > > > > file_end_write()#0 > > > > > > > > Where file_end_write()#0 can never be called and none of the threads can > > > > make progress. > > > > > > > > The assertion is checked for both MAY_READ and MAY_WRITE permission > > > > hooks in preparation for a pre-modify permission event. > > > > > > > > The assertion is not checked for an open permission event, because > > > > do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not > > > > safe to write to filesystem in the content of an open permission event. > > > ^^^^^ context > > > > > > BTW, isn't this a bit inconvenient? I mean filling file contents on open > > > looks quite natural... Do you plan to fill files only on individual read / > > > write events? I was under the impression simple HSM handlers would be doing > > > it on open. > > > > > > > Naive HSMs perhaps... The problem with filling on open is that the pattern > > open();fstat();close() is quite common with applications and we found open() > > to be a sub-optimal predicate for near future read(). > > > > Filling the file on first read() access or directory on first > > readdir() access does > > a better job in "swapping in" the correct files. > > A simple HSM would just fill the entire file/dir on the first PRE_ACCESS event. > > that's not any more or less simple than filling it on an OPEN_PERM event. > > > > Another point that could get lost when reading to above deadlock is that > > filling the file content before open(O_TRUNC) would be really dumb, > > because swap in is costly and you are going to throw away the data. > > If we really wanted to provide HSM with a safe way to fill files on open, > > we would probably need to report the open flags with the open event. > > I actually think that reporting the open flags would be nice even with > > an async open event. > > OK, thanks for explanation! Anyway, I will try to find a solution also for the OPEN_PERM deadlock. I have some sketches, but ->atomic_open() complicates things... Thanks, Amir.