On Sat, Oct 29, 2016 at 05:20:17PM +0200, Christoph Hellwig wrote: > On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote: > > How about taking this chunk (i.e. telling lockdep that we are not holding this > > thing) past the iter_op() call, where file_end_write() used to be? > > We can't as that would not fix the use after free (at least for the lockdep > case - otherwise the call is a no-op). Once iter_op returns aio_complete > might have dropped our reference to the file, and another thread might > have closed the fd so that the fput from aio_complete was the last one. > > This is something that xfstests/323 can reproduce under the right conditions. Eww... IOW, as soon as we'd submitted an async iocb, nobody can so much as look at struct file *or* iocb, right? Or underlying inode, or any fs-private data structures attached to it... I certainly agree that it's a bug, but IMO you are just papering over it. Just look at e.g. written = mapping->a_ops->direct_IO(iocb, &data); /* * Finally, try again to invalidate clean pages which might have been * cached by non-direct readahead, or faulted in by get_user_pages() * if the source of the write was an mmap'ed region of the file * we're writing. Either one is a pretty crazy thing to do, * so we don't support it 100%. If this invalidation * fails, tough, the write still worked... */ if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); } in generic_file_direct_write() - who said that mapping doesn't point into freed memory by that point? Or that inode_lock(inode); ret = generic_write_checks(iocb, from); if (ret > 0) ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); in generic_file_write_iter() won't blow up on inode_unlock() of a freed inode? Or that anything of that sort won't happen in other ->write_iter instances, for that matter? NAK, with apologies for not having looked at that earlier. The bug is real, all right, but this is not a solution - both incomplete and far too brittle. Why do we play that kind of insane games, anyway? Why not e.g. refcount the (async) iocb and have kiocb_free() drop the reference, with io_submit_one() holding one across the call of aio_run_iocb()? Cacheline bouncing issues? Anything more subtle? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html