On Fri, Mar 8, 2019 at 3:36 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > It was bogus garbage in fs/aio.c, and honestly, looking at how much of > the logic looks suspiciously very similar here, I suspect it's bogus > garbage here too. Apart from that "this really looks suspicious" I actually like what the refcounting code does. The code looks like it took refcounting seriously, and the context refcounting and setup synchronization looks like it should work with that whole registration logic etc. But if the low-level request refcount is off, all bets are off. And I do think it looks like the basic io_kiocb refcounting is wrong, and that refcount_set(&req->refs, 0); in io_get_req() really is a big big red flag for "Oh, I'm not doing my refcount properly". It seems to be an optimization for "I only have a single refcount, and I don't want to have that expensive atomic decrement". But that automatically means that the whole "req" pointer is now really really dangerous, because if IO finishes early, it will be free'd immediately, possibly even before the synchronous part of submission is all done. And, as mentioned, it can free the file (and thus the inode) before the synchronous part is even done, which can cause really subtle problems. I think the fix is the same that fs/aio.c is getting: just always initialize the request refcount to 2: one for the submission side, and one for the completion side. Initializing a refcount to 0 is basically always a bug. You can't return a structure that has a zero refcount - by definition you'd better have your own reference on it. Linus