On Thu 18-10-18 19:17:53, Amir Goldstein wrote: > On Thu, Oct 18, 2018 at 1:22 PM Jan Kara <jack@xxxxxxx> wrote: > > > > Detaching of mark connector from fsnotify_put_mark() can race with > > unmounting of the filesystem like: > > > > CPU1 CPU2 > > fsnotify_put_mark() > > spin_lock(&conn->lock); > > ... > > inode = fsnotify_detach_connector_from_object(conn) > > spin_unlock(&conn->lock); > > generic_shutdown_super() > > fsnotify_unmount_inodes() > > sees connector detached for inode > > -> nothing to do > > evict_inode() > > barfs on pending inode reference > > iput(inode); > > > > Resulting in "Busy inodes after unmount" message and possible kernel > > oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode > > references from detached connectors. > > > > Note that the accounting of outstanding inode references in the > > superblock can cause some cacheline contention on the counter. OTOH it > > happens only during deletion of the last notification mark from an inode > > (or during unlinking of watched inode) and that is not too bad. I have > > measured time to create & delete inotify watch 100000 times from 64 > > processes in parallel (each process having its own inotify group and its > > own file on a shared superblock) on a 64 CPU machine. Average and > > standard deviation of 15 runs look like: > > > > Avg Stddev > > Vanilla 9.817400 0.276165 > > Fixed 9.710467 0.228294 > > > > So there's no statistically significant difference. > > > > Fixes: ??? Good point, will add this together with CC stable. > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > --- > > fs/notify/fsnotify.c | 3 +++ > > fs/notify/mark.c | 31 ++++++++++++++++++++++++------- > > include/linux/fs.h | 3 +++ > > 3 files changed, 30 insertions(+), 7 deletions(-) > > > > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c > > index f174397b63a0..00d4f4357724 100644 > > --- a/fs/notify/fsnotify.c > > +++ b/fs/notify/fsnotify.c > > @@ -96,6 +96,9 @@ void fsnotify_unmount_inodes(struct super_block *sb) > > > > if (iput_inode) > > iput(iput_inode); > > + /* Wait for outstanding inode references from connectors */ > > + wait_var_event(&sb->s_fsnotify_inode_refs, > > + !atomic_long_read(&sb->s_fsnotify_inode_refs)); > > } > > > > /* > > diff --git a/fs/notify/mark.c b/fs/notify/mark.c > > index 59cdb27826de..61c58e1759ee 100644 > > --- a/fs/notify/mark.c > > +++ b/fs/notify/mark.c > > @@ -179,7 +179,7 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) > > } > > } > > > > -static struct inode *fsnotify_detach_connector_from_object( > > +static void *fsnotify_detach_connector_from_object( > > struct fsnotify_mark_connector *conn) > > { > > struct inode *inode = NULL; > > @@ -190,6 +190,7 @@ static struct inode *fsnotify_detach_connector_from_object( > > if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { > > inode = fsnotify_conn_inode(conn); > > inode->i_fsnotify_mask = 0; > > + atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); > > } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { > > fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0; > > } > > @@ -211,10 +212,26 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark) > > fsnotify_put_group(group); > > } > > > > +/* Drop object reference originally held by a connector */ > > +static void fsnotify_drop_object(void *objp) > > +{ > > + struct inode *inode; > > + struct super_block *sb; > > + > > + if (!objp) > > + return; > > + /* Currently only inode references are passed to be dropped */ > > This patch would have been a whole lot shorter if you just implemented > fsnotify_drop_inode(struct inode *inode) > > Did you have anything specific in mind when you made it generic? Yes. I was afraid that if we just keep returning struct inode *, it is just too easy to drop the reference with iput() and leak the counter. Maybe I'm just too paranoid, I'll sleep to it and decide tomorrow ;). > Anyway, if you do choose to leave this function "generic", please > pass in conn->type to this function and do "the inode thing" only for > type == FSNOTIFY_OBJ_TYPE_INODE. Good point, will do if I decide to stay with opaque pointer. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR