On Thu 03-11-16 12:25:13, Amir Goldstein wrote: > On Thu, Nov 3, 2016 at 12:09 AM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > We've got a report where a fanotify daemon that implements permission checks > > screws up and doesn't send a reply. This then causes widespread hangs due to > > fsnotify_mark_srcu read side lock being held and thus causing synchronize_srcu() > > called from e.g. inotify_release()-> fsnotify_destroy_group()-> > > fsnotify_mark_destroy_list() to block. > > > > Below program demonstrates the issue. It should output a single line: > > > > close(inotify_fd): success > > > > Instead it outputs nothing, which means that close(inotify_fd) got blocked by > > the waiting permission event. > > > > Wouldn't making the srcu per-group fix this? Would that be too expensive? > > > > IIUC, the purpose of fsnotify_mark_srcu is to protect the inode and vfsmount > mark lists, which are used to iterate the groups to send events to. > So you cannot make it per group as far as I can tell. > > OTOH, specifically, for fanotify, once you can passed > fanotify_should_send_event(), > there is no need to keep a reference to the mark, so it may be safely destroyed, > you only need a reference to the group. > > I tested this patch on top of my fanotify super block watch development branch > and it seems to fix the problem you reported, but I am not savvy enough with > srcu to say that this is correct. > Jan, what do you think? So the way you've written it is definitely buggy. The inode_node and vfsmount_node pointers may become invalid once you drop SRCU lock and so you cannot dereference them even after regaining that lock. Also frankly your solution looks a bit ugly. I'll think whether we can somehow fix the problem... Honza > > From 28da34cdf9bf71fe9bbac13ded11a19da3b7a48e Mon Sep 17 00:00:00 2001 > From: Amir Goldstein <amir73il@xxxxxxxxx> > Date: Thu, 3 Nov 2016 11:55:46 +0200 > Subject: [PATCH] fanotify: handle permission events without holding > fsnotify_mark_srcu > > Handling fanotify events does not entail dereferencing fsnotify_mask > beyond the point of fanotify_should_send_event(). > > For the case of permission events, which may block indefinetely, > return -EAGAIN and then fsnotify() will call handle_event() again > without a reference to the mark. > > Without a reference to the mark, there is no need to call > handle_event() under fsnotify_mark_srcu read side lock > which is protecting the inode/vfsmount mark lists. > We just need to hold a reference to the group > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > --- > fs/notify/fanotify/fanotify.c | 14 +++++++++++--- > fs/notify/fsnotify.c | 26 ++++++++++++++++++++++---- > 2 files changed, 33 insertions(+), 7 deletions(-) > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index 604e307..0ffa9ed 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -298,9 +298,17 @@ static int fanotify_handle_event(struct > fsnotify_group *group, > BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM); > BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); > > - if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask, data > - data_type)) > - return 0; > + if (inode_mark || vfsmnt_mark) { > + if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask, > + data, data_type)) > + return 0; > + > +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS > + /* Ask to be called again without a reference to mark */ > + if (mask & FAN_ALL_PERM_EVENTS) > + return -EAGAIN; > +#endif > + } > > pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode, > mask); > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c > index 34bccbe..dc0c86e 100644 > --- a/fs/notify/fsnotify.c > +++ b/fs/notify/fsnotify.c > @@ -199,7 +199,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, > void *data, int data_is, > { > struct hlist_node *inode_node = NULL, *vfsmount_node = NULL; > struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL; > - struct fsnotify_group *inode_group, *vfsmount_group; > + struct fsnotify_group *inode_group, *vfsmount_group, *group; > struct mount *mnt; > int idx, ret = 0; > /* global tests shouldn't care about events on child only the > specific event */ > @@ -282,15 +282,33 @@ int fsnotify(struct inode *to_tell, __u32 mask, > void *data, int data_is, > ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask, > data, data_is, cookie, file_name); > > - if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS)) > - goto out; > - > if (inode_group) > inode_node = srcu_dereference(inode_node->next, > &fsnotify_mark_srcu); > if (vfsmount_group) > vfsmount_node = srcu_dereference(vfsmount_node->next, > &fsnotify_mark_srcu); > + > + if (ret != -EAGAIN) > + continue; > + > + group = inode_group ? : vfsmount_group; > + fsnotify_get_group(group); > + srcu_read_unlock(&fsnotify_mark_srcu, idx); > + /* > + * Calling handle_event() again without reference to mark, > + * so we do not need to call it under fsnotify_mark_srcu > + * which is protecting the inode/vfsmount mark lists. > + * We just need to hold a reference to the group > + */ > + return group->ops->handle_event(group, to_tell, NULL, NULL, > + mask, data, data_is, > + file_name, cookie); > + idx = srcu_read_lock(&fsnotify_mark_srcu); > + fsnotify_put_group(group); > + > + if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS)) > + goto out; > } > ret = 0; > out: > -- > 2.7.4 -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html