On Fri, 10 Jul 2015 14:54:44 +0200 William Dauchy <william@xxxxxxxxx> wrote: > On Jul10 07:24, Jeff Layton wrote: > > Huh. I'm stumped... > > > > These patches are pretty straightforward. We're just taking an extra > > reference to the filp when running lock operations so that it doesn't > > disappear before the replies can be processed (typically in the event > > that a signal comes in while waiting on the reply). Given the odd stack > > trace above, I have to wonder if there's some sort of memory scribble > > going on. > > I also forgot to mention that I also had the following messgae before > the trace: > > VFS: Close: file count is 0 > Ok, that may be an important clue. From filp_close: if (!file_count(filp)) { printk(KERN_ERR "VFS: Close: file count is 0\n"); return 0; } ...so looks like there could be a use-after free going on? Somehow we're ending up with with an actual close being done after the last reference has already been put. I'm not s So, I suspect that the problem is with the second patch (the LOCKU one). I'm not sure if it's responsible for that message, but one of the things we do in __fput() is call locks_remove_flock, which can dip down into the NFS unlock codepath. So if a file happened to have some flock locks on it, then we could be taking a new reference to a file that has already had its refcount go to zero. I'll have to think about how best to deal with this as I totally missed this when I did the original analysis of the bug. For now it's probably best to revert that patch (though I think the one for the setlk is likely OK). Thanks, -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
Attachment:
pgpUUuBvEGi5u.pgp
Description: OpenPGP digital signature