There are reports of high io_uring submission latency for ext4 and xfs, which is due to iomap not propagating nowait flag to the block layer resulting in waiting for IO during tag allocation. Because of how errors are propagated back, we can't set REQ_NOWAIT for multi bio IO, in this case return -EAGAIN and let the caller to handle it, for example, it can reissue it from a blocking context. It's aligned with how raw bdev direct IO handles it. Cc: stable@xxxxxxxxxxxxxxx Link: https://github.com/axboe/liburing/issues/826#issuecomment-2674131870 Reported-by: wu lei <uwydoc@xxxxxxxxx> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> --- v2: Fail multi-bio nowait submissions fs/iomap/direct-io.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index b521eb15759e..07c336fdf4f0 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -363,9 +363,14 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, */ if (need_zeroout || ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) || - ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) + ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) { dio->flags &= ~IOMAP_DIO_CALLER_COMP; + if (!is_sync_kiocb(dio->iocb) && + (dio->iocb->ki_flags & IOCB_NOWAIT)) + return -EAGAIN; + } + /* * The rules for polled IO completions follow the guidelines as the * ones we set for inline and deferred completions. If none of those @@ -374,6 +379,23 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, if (!(dio->flags & (IOMAP_DIO_INLINE_COMP|IOMAP_DIO_CALLER_COMP))) dio->iocb->ki_flags &= ~IOCB_HIPRI; + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic); + + if (!is_sync_kiocb(dio->iocb) && (dio->iocb->ki_flags & IOCB_NOWAIT)) { + /* + * This is nonblocking IO, and we might need to allocate + * multiple bios. In this case, as we cannot guarantee that + * one of the sub bios will not fail getting issued FOR NOWAIT + * and as error results are coalesced across all of them, ask + * for a retry of this from blocking context. + */ + if (bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS + 1) > + BIO_MAX_VECS) + return -EAGAIN; + + bio_opf |= REQ_NOWAIT; + } + if (need_zeroout) { /* zero out from the start of the block to the write offset */ pad = pos & (fs_block_size - 1); @@ -383,8 +405,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, goto out; } - bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic); - nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS); do { size_t n; -- 2.48.1