On 11/18/18 6:57 PM, Dave Chinner wrote: > On Sat, Nov 17, 2018 at 04:53:17PM -0700, Jens Axboe wrote: >> Needs further work, but this should work fine on normal setups >> with a file system on a pollable block device. > > What should work fine? I've got no idea what this patch is actually > providing.... Sorry, should have made it more clear that this one is just a test patch to enable polled aio for files, it's not supposed to be considered complete. >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> --- >> fs/aio.c | 2 ++ >> fs/direct-io.c | 4 +++- >> fs/iomap.c | 7 +++++-- >> 3 files changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/fs/aio.c b/fs/aio.c >> index 500da3ffc376..e02085fe10d7 100644 >> --- a/fs/aio.c >> +++ b/fs/aio.c >> @@ -1310,6 +1310,8 @@ static struct block_device *aio_bdev_host(struct kiocb *req) >> >> if (S_ISBLK(inode->i_mode)) >> return I_BDEV(inode); >> + else if (inode->i_sb && inode->i_sb->s_bdev) >> + return inode->i_sb->s_bdev; > > XFS might be doing AIO to files on real-time device, not > inode->i_sb->s_bdev. So this may well be the wrong block device > for the IO being submitted. See patch description, XFS doing AIO on a real-time device is not a "normal setup". It doesn't work for btrfs either, for instance. As far as I can tell, there are a few routes that we can go here: 1) Have an fs ops that returns the bdev. 2) Set it after submit time, like we do in other spots. This won't work if we straddle devices. I don't feel that strongly about it, and frankly I'm fine with just dropping non-bdev support for now. The patch is provided for testing for the 99% of use cases, which is a file system on top of a single device. >> diff --git a/fs/iomap.c b/fs/iomap.c >> index 74c1f37f0fd6..4cf412b6230a 100644 >> --- a/fs/iomap.c >> +++ b/fs/iomap.c >> @@ -1555,6 +1555,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, >> struct page *page = ZERO_PAGE(0); >> int flags = REQ_SYNC | REQ_IDLE; >> struct bio *bio; >> + blk_qc_t qc; >> >> bio = bio_alloc(GFP_KERNEL, 1); >> bio_set_dev(bio, iomap->bdev); >> @@ -1570,7 +1571,9 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, >> bio_set_op_attrs(bio, REQ_OP_WRITE, flags); >> >> atomic_inc(&dio->ref); >> - return submit_bio(bio); >> + qc = submit_bio(bio); >> + WRITE_ONCE(dio->iocb->ki_blk_qc, qc); >> + return qc; >> } > > Why is this added to sub-block zeroing IO calls? It gets overwritten > by the data IO submission, so this value is going to change as the > IO progresses. What does making these partial IOs visible provide, > especially as they then get overwritten by the next submissions? > Indeed, how does one wait on all IOs in the DIO to complete if we > are only tracking one of many? That does look like a mistake, the zeroing should just be ignored. We only care about the actual IO. -- Jens Axboe