On Wed 18-04-18 17:05:25, Robert Kolchmeyer wrote: > fsnotify() acquires a reference to a fsnotify_mark_connector through > the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it > appears that no precautions are taken in fsnotify_put_mark() to > ensure that fsnotify() drops its reference to this > fsnotify_mark_connector before assigning a value to its 'destroy_next' > field. This can result in fsnotify_put_mark() assigning a value > to a connector's 'destroy_next' field right before fsnotify() tries to > traverse the linked list referenced by the connector's 'list' field. > Since these two fields are members of the same union, this behavior > results in a kernel panic. Ahh, I was reading this code just two days ago trying to find some issues (due to reports of softlockups in fsnotify()) but this didn't occur to me. Great spotting! > This issue is resolved by calling synchronize_srcu() before updating > the connector's 'destroy_next' field in fsnotify_put_mark(). Since the > relevant section of fsnotify() occurs in a SRCU read-side critical > section, this will force fsnotify_put_mark() to wait for fsnotify() > to finish operating on the connector before updating its 'destroy_next' > field. Since fsnotify_put_mark() removes references to the connector > before assigning its 'destroy_next' field, it shouldn't be possible for > another thread to acquire a reference to the connector while > fsnotify_put_mark() waits for fsnotify() to finish its work. > > The offending behavior here is extremely unlikely; since > fsnotify_put_mark() removes references to a connector (specifically, > it ensures that the connector is unreachable from the inode it was > formerly attached to) before updating its 'destroy_next' field, a > sizeable chunk of code in fsnotify_put_mark() has to execute in the > short window between when fsnotify() acquires the connector reference > and saves the value of its 'list' field. On the HEAD kernel, I've only > been able to reproduce this by inserting a udelay(1) in fsnotify(). > However, I've been able to reproduce this issue without inserting a > udelay(1) anywhere on older unmodified release kernels, so I believe > it's worth fixing at HEAD. Oh, the bug definitely needs fixing. I somewhat dislike synchronize_srcu() when destroying the connector though. The whole point of connector_destroy_list is to batch synchronize_srcu() calls and move them out of the relative fast path. So what I'd do instead is to move 'destroy_next' from a union with 'list' to a union with 'inode' and 'mnt'. Those are not safe to access under srcu anyway - you have to hold conn->lock to get them and use conn->flags to determine the type under the same lock. So in that union we are actually safe to use the space for anything once conn->flags are cleared by fsnotify_detach_connector_from_object(). Will you send a patch for that so that you get a proper credit for all the debugging you did? :) Thanks! Honza > Fixes bug 199437: https://bugzilla.kernel.org/show_bug.cgi?id=199437 > > Fixes: 08991e83b7286635167bab40927665a90fb00d81 > Signed-off-by: Robert Kolchmeyer <rkolchmeyer@xxxxxxxxxx> > --- > fs/notify/mark.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/notify/mark.c b/fs/notify/mark.c > index e9191b416434..358fc7de1e86 100644 > --- a/fs/notify/mark.c > +++ b/fs/notify/mark.c > @@ -227,6 +227,7 @@ void fsnotify_put_mark(struct fsnotify_mark *mark) > iput(inode); > > if (free_conn) { > + synchronize_srcu(&fsnotify_mark_srcu); > spin_lock(&destroy_lock); > conn->destroy_next = connector_destroy_list; > connector_destroy_list = conn; > -- > 2.17.0.484.g0c8726318c-goog > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR