On Fri, Jan 6, 2017 at 12:43 PM, Jan Kara <jack@xxxxxxx> wrote: > Hello, > > This is the second revision of my patches to avoid SRCU stalls when fanotify > waits for response to permission events from userspace processes. Thanks > to Amir and Paul for review! > > Changes since v1: > * renamed fsnotify_mark_list to fsnotify_mark_connector and couple other > things > * updated some comments and changelogs to better explain what is going on > * made audit use inode pointer as a key again > * added Reviewed-by tags > * dropped two audit fixes that got already merged > * added cleanup of mark destruction functions > Went over the entire series again (also quick pass on the Reviewed patches). Looking good. Some minor nits on patches 6 and 9. > Patch set overview > ------------------ > > Currently, fanotify waits for response to a permission even from userspace > process while holding fsnotify_mark_srcu lock. That has a consequence that > when userspace process takes long to respond or does not respond at all, > fsnotify_mark_srcu period cannot ever complete blocking reclaim of any > notification marks and also blocking any process that did synchronize_srcu() > on fsnotify_mark_srcu. Effectively, this eventually blocks anybody interacting > with the notification subsystem. Miklos has some real world reports of this > happening. Although this in principle a problem of broken userspace > application (which futhermore has to have CAP_SYS_ADMIN in init_user_ns, so > it is not a security problem), it is still nasty that a simple error can > block the kernel like this. > > This patch set solves this problem. The basic idea of the solution is that > when fanotify needs to wait for response from userspace process, it grabs > reference to the mark which generated the event and drops fsnotify_mark_srcu > lock. When userspace responds, we grab fsnotify_mark_srcu again, drop > the mark reference, and continue iterating the list of marks attached to the > inode / vfsmount delivering the event to other notification groups. What > complicates this simple approach is that the mark for which we wait for > response has to stay pinned in the list of marks attached to the inode / > vfsmount so that we can resume iteration of the list when userspace responds > but on the other hand when the inode gets unlinked while we wait for userspace > reponse, we need to destroy the mark (or at least detach it from the inode). > > The first 3 patches contain some initial fixes and cleanups. Patches 4-6 > implement attaching of marks to inode / vfsmount via a dedicated structure > which allows us to detach list of marks from the object without having to > destroy the list itself. Patches 7-9 implement removal of mark from the > list of marks attached to an object when last mark reference is dropped. > Patches 10-13 then implement dropping of SRCU lock when waiting on response > from userspace. Patches 14-22 are mostly trivial cleanups that get rid of > trivial wrappers and one pointer in the mark structure. > > Patches have survived testing with inotify/fanotify tests in LTP. Did you already write an LTP test from Miklos' test program that ignited this work? Would you like to take this small task? My head hurts from thinking about all the corner cases we need to stress test with this work... but I'll try to pick myself up and write a list of test coverage as a starting point. > I didn't test > audit - Paul can you give these patches some testing? Since some of the > changes are really non-trivial, I'd welcome if someone reviewed the patch set. > Thanks! > > Finally, to ease experimenting with the patches I've pushed them out to > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git for_testing > I was going to ask that from you. Thanks! Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html