On 12/6/20 7:21 PM, Dave Chinner wrote: > On Fri, Dec 04, 2020 at 05:44:56PM +0800, Hao Xu wrote: >> Currently, IOCB_NOWAIT is ignored in Direct IO, REQ_NOWAIT is only set >> when IOCB_HIPRI is set. But REQ_NOWAIT should be set as well when >> IOCB_NOWAIT is set. >> >> Suggested-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> >> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> >> --- >> >> Hi all, >> I tested fio io_uring direct read for a file on ext4 filesystem on a >> nvme ssd. I found that IOCB_NOWAIT is ignored in iomap layer, which >> means REQ_NOWAIT is not set in bio->bi_opf. > > What iomap is doing is correct behaviour. IOCB_NOWAIT applies to the > filesystem behaviour, not the block device. > > REQ_NOWAIT can result in partial IO failures because the error is > only reported to the iomap layer via IO completions. Hence we can > split a DIO into multiple bios and have random bios in that IO fail > with EAGAIN because REQ_NOWAIT is set. This error will > get reported to the submitter via completion, and it will override > any of the partial IOs that actually completed. > > Hence, like the recently reported multi-mapping IOCB_NOWAIT bug > reported by Jens and fixed in commit 883a790a8440 ("xfs: don't allow > NOWAIT DIO across extent boundaries") we'll get silent partial > writes occurring because the second submitted bio in an IO can > trigger EAGAIN errors with partial IO completion having already > occurred. > > Further, we don't allow partial IO completion for DIO on XFS at all. > DIO must be completely submitted and completed or return an error > without having issued any IO at all. Hence using REQ_NOWAIT for > DIO bios is incorrect and not desirable. What you say makes total sense for a user using RWF_NOWAIT, but it doesn't make a lot of sense for io_uring where we really want IOCB_NOWAIT to be what it suggests it is - don't wait for other IO to complete, if avoidable. One of the things that really suck with aio/libai is the "yeah it's probably async, but lol, might not be" aspect of it. For io_uring, if we do get -EAGAIN, we'll retry without NOWAIT set. So the concern about fractured/short writes doesn't bubble up to the application. Hence we really want an IOCB_NOWAIT_REALLY on that side, instead of the poor mans IOCB_MAYBE_NOWAIT semantics. -- Jens Axboe