On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@xxxxxxx> wrote: > Currently notification marks are attached to object (inode or vfsmnt) by > a hlist_head in the object. The list is also protected by a spinlock in > the object. So while there is any mark attached to the list of marks, > the object must be pinned in memory (and thus e.g. last iput() deleting > inode cannot happen). Also for list iteration in fsnotify() to work, we > must hold fsnotify_mark_srcu lock so that mark itself and > mark->obj_list.next cannot get freed. Thus we are required to wait for > response to fanotify events from userspace process with > fsnotify_mark_srcu lock held. That causes issues when userspace process > is buggy and does not reply to some event - basically the whole > notification subsystem gets eventually stuck. > > So to be able to drop fsnotify_mark_srcu lock while waiting for > response, we have to pin the mark in memory and make sure it stays in > the object list (as removing the mark waiting for response could lead to > lost notification events for groups later in the list). However we don't > want inode reclaim to block on such mark as that would lead to system > just locking up elsewhere. > > This commit tries to pave a way towards solving these conflicting > lifetime needs. Instead of anchoring the list of marks directly in the > object, we anchor it in a dedicated structure (fsnotify_mark_list) and > just point to that structure from the object. Also the list is protected > by a spinlock contained in that structure. With this, we can detach > notification marks from object without having to modify the list itself. > The structural change looks very good to. It makes the code much easier to manage IMO. I am only half way though this big change, but I wanted to make one meta comment. I have a problem with the choice of naming for the new struct. 'list' is really an overloaded term and the use of 'list' as a name of a class that contains a list head makes for some really confusing constructs like list->list and mark->obj_list_head which is not a list_head struct. For future generations, I suggest that we invest the effort in choosing a name that makes more sense. I do realize how annoying it would be to fix the entire series now, so it's not a problem if renaming is done in the end of the series as long as we agree on the end result. May I suggest the name fsnotify_tap to describe the new struct. I know it is arbitrary, but not more arbitrary then fsnotify_mark and certainly not any more arbitrary then fsnotify_group. Here are some examples of constructs that will make more sense: <+#define FSNOTIFY_LIST_TYPE_INODE 0x01 <+#define FSNOTIFY_LIST_TYPE_VFSMOUNT 0x02 >+#define FSNOTIFY_TAP_TYPE_INODE 0x01 >+#define FSNOTIFY_TAP_TYPE_VFSMOUNT 0x02 LIST_TYPE_INODE implies this is a list of inodes TAP_TYPE_INODE implies this is a tap on an inode <+ /* Head of list of marks for an object [mark->lock, group->mark_mutex] */ <+ struct fsnotify_mark_list *obj_list_head; >+ /* Container for list of marks for an object [mark->lock, group->mark_mutex] */ >+ struct fsnotify_tap *tap; ... +static struct inode *fsnotify_detach_from_object(struct fsnotify_mark *mark) +{ + struct fsnotify_mark_list *list; + struct inode *inode = NULL; + bool free_list = false; + + list = mark->obj_list_head; + spin_lock(&list->lock); + hlist_del_init_rcu(&mark->obj_list); + if (hlist_empty(&list->list)) { + if (list->flags & FSNOTIFY_LIST_TYPE_INODE) { + inode = list->inode; + inode->i_fsnotify_marks = NULL; + inode->i_fsnotify_mask = 0; + list->inode = NULL; + list->flags &= ~FSNOTIFY_LIST_TYPE_INODE; + } else if (list->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT) { + real_mount(list->mnt)->mnt_fsnotify_marks = NULL; + real_mount(list->mnt)->mnt_fsnotify_mask = 0; + list->mnt = NULL; + list->flags &= ~FSNOTIFY_LIST_TYPE_VFSMOUNT; + } + free_list = true; if (hlist_empty(&tap->list)) { fsnotify_detach_tap(tap); /* this helper is very called for IMO */ free_tap = true; + } else + __fsnotify_recalc_mask(list); + mark->obj_list_head = NULL; + spin_unlock(&list->lock); + + if (free_list) { + spin_lock(&destroy_lock); + list->destroy_next = list_destroy_list; + list_destroy_list = list; And killing list_destroy_list is my personal favorite... + spin_unlock(&destroy_lock); + queue_work(system_unbound_wq, &list_reaper_work); + } + + return inode; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html