Hello, On Wed 24-06-15 10:13:50, Ashish Sangwan wrote: > > On Tue 23-06-15 12:05:51, Ashish Sangwan wrote: > > > > Looking into this in more detail, it might be worthwhile to revisit how > > > > mark_mutex is used since at least fanotify and dnotify use it for more than > > > > just a protection of list of group marks and untangling this would simplify > > > > things. But that's a longer term goal. > > > > > > > > A relatively simple fix for your issue is to split group list of marks into > > > > a list of inode marks and a list of mount marks. Then destroying becomes > > > > much simpler because we always discard the whole list (or both of them) and > > > > we can easily avoid problems with list corruption when dropping the > > > > mark_mutex. I can write the patch later or you can do that if you are > > > Sorry I could not understand why the group's list of marks needs to be split. > > > I was browsing through the old code, from the days mark_mutex was not present > > > and it looked like below: > > > void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, > > > unsigned int flags) > > > { > > > struct fsnotify_mark *lmark, *mark; > > > LIST_HEAD(free_list); > > > > > > spin_lock(&group->mark_lock); > > > list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { > > > if (mark->flags & flags) { > > > list_add(&mark->free_g_list, &free_list); > > > list_del_init(&mark->g_list); > > > fsnotify_get_mark(mark); > > > } > > > } > > > spin_unlock(&group->mark_lock); > > > > > > list_for_each_entry_safe(mark, lmark, &free_list, free_g_list) { > > > fsnotify_destroy_mark(mark); > > > fsnotify_put_mark(mark); > > > } > > > } > > > How about using a temporary onstack list_head like above? > > > > So we can use a temporary list_head for entries selected for destruction as > > well. *But* in either case you have to be careful because even the > > temporary free_list can be modified by concurrent mark destruction e.g. from > > fsnotify_free_marks_by_inode() call as each mark is in two lists - one > > hanging from inode / mount and one hanging from the notification group. > True. > > > > Actually, looking into it even fsnotify_destroy_marks() doesn't seem to be > > properly protected from the race. Sigh. > True. > > So anything which is calling fsnotify_destroy_mark is not protected against > concurrent deletion from fsnotify_clear_marks_by_group_flags() and vice versa. > Plus, as you rightly pointed, we have both the inode mark and vfsmount sharing the same list. > So even if fsnotify_clear_marks_by_group_flags is for removing inode mark, a > parallel fsnotify_destroy_mark for vfsmount can cause crash as they are sharing > same list. So I was looking more into the issue today and fsnotify_destroy_marks() is fine in the end since it holds a reference to each mark (so it cannot be freed) and it also uses a special free_list list_head within mark for its temporary list so nobody removes any mark from that list. > diff --git a/fs/notify/mark.c b/fs/notify/mark.c > index d90deaa..d83ec7d 100644 > --- a/fs/notify/mark.c > +++ b/fs/notify/mark.c > @@ -124,14 +124,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, > > spin_lock(&mark->lock); > > - /* something else already called this function on this mark */ > - if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) { > - spin_unlock(&mark->lock); > - return; > - } > - > - mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE; > - > if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) { > inode = mark->i.inode; > fsnotify_destroy_inode_mark(mark); > @@ -188,7 +180,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark, > struct fsnotify_group *group) > { > mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); > - fsnotify_destroy_mark_locked(mark, group); > + if (mark->flags & FSNOTIFY_MARK_FLAG_ALIVE) { > + mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE; > + fsnotify_destroy_mark_locked(mark, group); > + } > mutex_unlock(&group->mark_mutex); > } > > @@ -293,14 +288,27 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, > unsigned int flags) > { > struct fsnotify_mark *lmark, *mark; > - > + LIST_HEAD(free_list); > + > mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); > - list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { > + /* Pass 1 : clear the alive flag and move to free list */ > + list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) > if (mark->flags & flags) { > + /* > + * If the mark is present on group's mark list > + * it has to be alive. > + */ > + WARN_ON(!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)); > + list_del_init(&mark->g_list); > + list_add(&mark->g_list, &free_list); > + mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE; > + } > + > + /* Pass 2: remove mark */ > + list_for_each_entry_safe(mark, lmark, &free_list, g_list) { > fsnotify_get_mark(mark); > fsnotify_destroy_mark_locked(mark, group); > fsnotify_put_mark(mark); > - } > } > mutex_unlock(&group->mark_mutex); > } So this should fix the problem as well. However the lifetime rules are so convoluted that I'm thinking how to make things simpler (especially to unify how clearing via inode/mount and via group works). When investigating the code I also found out there are possible issues with mark->inode getting cleared while we are delivering some event. So I'm looking into fixing all the problems. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html