On Thu, Nov 28, 2013 at 09:23:01PM +0000, Al Viro wrote: > On Thu, Nov 28, 2013 at 04:26:18PM +0000, Al Viro wrote: > > On Wed, Nov 27, 2013 at 02:09:06AM -0800, Christoph Hellwig wrote: > > > > > Also if you want to look me into something else feel free - it's very > > > reproducable here. Wish I could be more help here, but with all the > > > RCU and micro optimizations in the path lookup code I can't claim to > > > really understand it anymore. > > > > OK, I've been able to reproduce it and I see at least a part of what's > > going on, but... > > > > What happens is that we get path_init() race with something and leave > > us with nd->path pointing to what used to be pwd but has become a > > negative dentry in process. > > > > AFAICS, it *was* borderline possible to hit before now: > > > > process A and B are CLONE_FS threads and are chdired to /tmp/foo > > A asks for e.g. readlink() on bar > > in path_init() we'd got nd->path (at /tmp/foo) and nd->seq; we are > > in LOOKUP_RCU mode, so nd->path isn't pinned. > > B chdirs them both to /tmp, leaving /tmp/foo not busy > > C rmdirs /tmp/foo > > A sets nd->inode to nd->path.dentry->d_inode, but this sucker has gone > > negative now. Sure, nd->seq doesn't match anymore, but that doesn't > > do us any good - the first thing we'll do in link_path_walk() is > > may_lookup(nd) and it'll blow on attempt to call inode_permission() for > > nd->inode. > > > > What I still do not understand is how the devil is similar race actually > > triggered during shutdown. Digging through that right now... > > > > Anyway, verifying that this is what's going on for particular reproducer > > is easy - add WARN_ON(!nd->inode) in the very end of path_init() and > > see if it triggers. > > *grumble* > > Looks like adding if (!nd->inode) { a bunch of printks } in the end of > path_init() makes the sucker disappear (so far 2 times out of 2, and > with a test run taking a bit under two hours, well...) The plain > WARN_ON(!nd->inode) in that place triggers just fine. I usually find that when printk() makes race conditions go away, switching to tracepoints works better. It's still not as good as reliable as when the debug is not there, but it seems to perturb race conditions a lot less. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html