On Sun, Jul 19, 2015 at 1:21 PM, Kinglong Mee <kinglongmee@xxxxxxxxx> wrote: > On 7/15/2015 21:21, Jan Kara wrote: >> From: Jan Kara <jack@xxxxxxx> >> >> fsnotify_clear_marks_by_group_flags() can race with >> fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops >> mark_mutex, a mark from the list iterated by >> fsnotify_clear_marks_by_group_flags() can be freed and we dereference >> free memory in the loop there. >> >> Fix the problem by keeping mark_mutex held in >> fsnotify_destroy_mark_locked(). The reason why we drop that mutex is >> that we need to call a ->freeing_mark() callback which may acquire >> mark_mutex again. To avoid this and similar lock inversion issues, we >> move the call to ->freeing_mark() callback to the kthread destroying the >> mark. >> >> Reported-by: Ashish Sangwan <a.sangwan@xxxxxxxxxxx> >> Suggested-by: Lino Sanfilippo <LinoSanfilippo@xxxxxx> >> Signed-off-by: Jan Kara <jack@xxxxxxx> >> --- >> fs/notify/mark.c | 34 ++++++++++++++-------------------- >> 1 file changed, 14 insertions(+), 20 deletions(-) >> >> diff --git a/fs/notify/mark.c b/fs/notify/mark.c >> index 92e48c70f0f0..3e594ce41010 100644 >> --- a/fs/notify/mark.c >> +++ b/fs/notify/mark.c >> @@ -152,31 +152,15 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, >> BUG(); >> >> list_del_init(&mark->g_list); >> - >> spin_unlock(&mark->lock); >> >> if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) >> iput(inode); >> - /* release lock temporarily */ >> - mutex_unlock(&group->mark_mutex); >> >> spin_lock(&destroy_lock); >> list_add(&mark->g_list, &destroy_list); >> spin_unlock(&destroy_lock); >> wake_up(&destroy_waitq); >> - /* >> - * We don't necessarily have a ref on mark from caller so the above destroy >> - * may have actually freed it, unless this group provides a 'freeing_mark' >> - * function which must be holding a reference. >> - */ >> - >> - /* >> - * Some groups like to know that marks are being freed. This is a >> - * callback to the group function to let it know that this mark >> - * is being freed. >> - */ >> - if (group->ops->freeing_mark) >> - group->ops->freeing_mark(mark, group); >> >> /* >> * __fsnotify_update_child_dentry_flags(inode); >> @@ -191,8 +175,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark, >> */ >> >> atomic_dec(&group->num_marks); >> - >> - mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); >> } >> >> void fsnotify_destroy_mark(struct fsnotify_mark *mark, >> @@ -205,7 +187,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark, >> >> /* >> * Destroy all marks in the given list. The marks must be already detached from >> - * the original inode / vfsmount. >> + * the original inode / vfsmount. Note that we can race with >> + * fsnotify_clear_marks_by_group_flags(). However we hold a reference to each >> + * mark so they won't get freed from under us and nobody else touches our >> + * free_list list_head. >> */ >> void fsnotify_destroy_marks(struct list_head *to_free) >> { >> @@ -406,7 +391,7 @@ struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head, >> } >> >> /* >> - * clear any marks in a group in which mark->flags & flags is true >> + * Clear any marks in a group in which mark->flags & flags is true. >> */ >> void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, >> unsigned int flags) >> @@ -460,6 +445,7 @@ static int fsnotify_mark_destroy(void *ignored) >> { >> struct fsnotify_mark *mark, *next; >> struct list_head private_destroy_list; >> + struct fsnotify_group *group; >> >> for (;;) { >> spin_lock(&destroy_lock); >> @@ -471,6 +457,14 @@ static int fsnotify_mark_destroy(void *ignored) >> >> list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) { >> list_del_init(&mark->g_list); >> + group = mark->group; >> + /* >> + * Some groups like to know that marks are being freed. >> + * This is a callback to the group function to let it >> + * know that this mark is being freed. >> + */ >> + if (group && group->ops->freeing_mark) >> + group->ops->freeing_mark(mark, group); >> fsnotify_put_mark(mark); >> } > > With this patch, I got so many memleak notice, > > unreferenced object 0xffff880035bef640 (size 64): > comm "fsnotify_mark", pid 26, jiffies 4294673717 (age 628.737s) > hex dump (first 32 bytes): > 28 36 3f 76 00 88 ff ff 28 36 3f 76 00 88 ff ff (6?v....(6?v.... > 00 00 00 00 00 00 00 00 00 80 00 00 00 00 ad de ................ > backtrace: > [<ffffffff816cd34e>] kmemleak_alloc+0x4e/0xb0 > [<ffffffff811ac6b5>] __kmalloc+0x1e5/0x290 > [<ffffffff81204f25>] inotify_handle_event+0x75/0x160 > [<ffffffff81205abc>] inotify_ignored_and_remove_idr+0x5c/0x80 > [<ffffffff8120505e>] inotify_freeing_mark+0xe/0x10 > [<ffffffff81203ca6>] fsnotify_mark_destroy+0xb6/0x150 > [<ffffffff810a4487>] kthread+0xd7/0xf0 > [<ffffffff816d92df>] ret_from_fork+0x3f/0x70 > [<ffffffffffffffff>] 0xffffffffffffffff > > It is caused by ->freeing_mark() insert an event to group, > but snotify_put_mark() kfree the group without free the event. Yep. I see the same leak. Now inotify_freeing_mark() is called after fsnotify_flush_notify() -- nobody releases these events anymore. > > thanks, > Kinglong Mee > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html