Hi, On Wed 24-06-15 00:30:16, Lino Sanfilippo wrote: > On 23.06.2015 12:25, Jan Kara wrote: > > On Mon 22-06-15 16:23:16, Ashish Sangwan wrote: > >> For deleting the fsnotify_mark related with an inode, there are 2 paths in the > >> kernel. When the inotify fd is closed, all the marks belonging to a group are > >> removed one by one in fsnotify_clear_marks_by_group_flags. Other path is when > >> the inode is removed from user space by unlink, fsnotify_destroy_mark is > >> called to delete a single mark. > >> There is a race between these 2 paths which is caused due to the temporary > >> release of the mark_mutex inside fsnotify_destroy_mark_locked. > >> The race happen when the inotify app monitoring the file(s) exits, triggering > >> fsnotify_clear_marks_by_group_flags to delete the marks. > >> This function use lmark pointer to move to the next node after a safe removal > >> of the node. In parallel, if there is rm call for a file and such that the > >> lmark is pointing to the mark which is removed by this rm call, lmark ends up > >> pointing to a freed memory. Now, when we try to move to the next node using > >> lmark, it triggers an invalid virtual address crash. > >> Although fsnotify_clear_marks_by_group_flags and fsnotify_destroy_mark are > >> synchronized by mark_mutex, but both of these functions call > >> fsnotify_destroy_mark_locked which release the mark_mutex and acquire it again > >> creating a subtle race window. There seems to be no reason for releasing > >> mark_mutex, so this patch remove the mutex_unlock call. > > > > Thanks for report and the analysis. I agree with your problem analysis. > > Indeed the loop in fsnotify_clear_marks_by_group_flags() isn't safe against > > us dropping the mark_mutex inside fsnotify_destroy_mark_locked(). However > > mark_mutex is dropped in fsnotify_destroy_mark_locked() for a purpose. We > > call ->freeing_mark() callback from there and that should be called without > > mark_mutex. In particular inotify uses this callback to send the IN_IGNORE > > event and that code certainly isn't prepared to be called under mark_mutex > > and you likely introduce interesting deadlock possibilities there. > > Why dont we call freeing_mark() from the "fsnotify_mark"-thread instead > of fsnotify_destroy_mark_locked()? So there would not be a reason for > this temporary unlock any longer and we could close that race as Ashish > suggested. We could do that as well. But I'd prefer to keep sending the IN_IGNORED event from the context of the process destroying the mark (not that I would be aware of any strong reason why that must happen but it just seems more natural). Also the event from destruction thread will be sent with a delay caused by synchronize_srcu(). Finally one long critical section for destruction of all marks belonging to a group doesn't seem ideal either. Anyway, I'll have this possibility in mind when implementing some solution. Maybe it will be the most elegant way... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html