On Sat, Oct 29, 2016 at 12:07:07PM -0700, Linus Torvalds wrote: > On Sat, Oct 29, 2016 at 11:52 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > And that call can happen as soon as we return from __blockdev_direct_IO() > > (even earlier, actually). As soon as that happens, the reference to > > struct file we'd acquired in io_submit_one() is dropped. If descriptor > > table had been shared, another thread might have already closed that sucker, > > and fput() from aio_complete() would free struct file. > > But that's the point. We don't *do* anything like that any more. We > now always do the final access from aio_complete(). So it doesn't > matter if that is called asynchronously (very early) or not. > > That's the whole point of the patch. Exactly to do everything either > *before* we even submit it (at which point no completion can happen), > or doing it in aio_complete() which is guaranteed to be after the > submission. No races, no use-after-free. > > What am I missing? if (ret > 0) ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); in generic_file_write_iter() (inode might be gone here) written = mapping->a_ops->direct_IO(iocb, &data); /* * Finally, try again to invalidate clean pages which might have been * cached by non-direct readahead, or faulted in by get_user_pages() * if the source of the write was an mmap'ed region of the file * we're writing. Either one is a pretty crazy thing to do, * so we don't support it 100%. If this invalidation * fails, tough, the write still worked... */ if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); in generic_file_direct_write() (mapping points into inode, which might be freed) ret = __blockdev_direct_IO(iocb, inode, target->bt_bdev, &data, xfs_get_blocks_direct, NULL, NULL, 0); if (ret >= 0) { iocb->ki_pos += ret; iov_iter_advance(to, ret); } xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED); in xfs_file_dio_aio_read() (ip points to xfs-private part of inode). And that - just from a quick look. We *do* access inode between the call of __blockdev_direct_IO() and return from ->write_iter(). What's more, as soon as aio_complete() has happened, what's to stop close from another thread + umount + rmmod unmapping that ->write_iter() completely? AFAICS, the possibility of dropping the last reference to struct file before ->write_iter() has returned is fundamentally broken. I might be missing something subtle here, but... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html