On Mon, Apr 22, 2024 at 11:19 PM Anuj Gupta <anuj20.g@xxxxxxxxxxx> wrote: > > In case of write, the iov_iter gets updated before retry kicks in. > Restore the iov_iter before retrying. It can be reproduced by issuing > a write greater than device limit. > > Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost) > > Signed-off-by: Anuj Gupta <anuj20.g@xxxxxxxxxxx> > --- > io_uring/rw.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/io_uring/rw.c b/io_uring/rw.c > index 4fed829fe97c..9fadb29ec34f 100644 > --- a/io_uring/rw.c > +++ b/io_uring/rw.c > @@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) > else > ret2 = -EINVAL; > > - if (req->flags & REQ_F_REISSUE) > + if (req->flags & REQ_F_REISSUE) { > + iov_iter_restore(&io->iter, &io->iter_state); > return IOU_ISSUE_SKIP_COMPLETE; > + } > > /* > * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just > -- > 2.25.1 > Looking more into it, no write happens incase of retry. This is because the first call to blkdev_direct_write advances the iter and updates the count to 0. Since the I/O needs to be split, retry handling gets triggered. We don't restore the iter, and the retry happens with count=0. Hence NO I/O. This doesn't happen incase of read, as blkdev_read_iter reverts the iter, and restores the right count value back[3]. NVMe device limit [1] Fio command used[2] [1] #cat /sys/block/nvme0n1/queue/max_hw_sectors_kb 512 [2] fio -iodepth=1 -rw=write -direct=1 -ioengine=io_uring -bs=1M -numjobs=1 \ -offset=0 -size=1M -group_reporting -filename=/dev/nvme0n1 -name=io_uring [3] static ssize_t blkdev_read_iter(struct kiocb iocb, struct iov_iterto) { if (iocb->ki_flags & IOCB_DIRECT) { ret = blkdev_direct_IO(iocb, to); if (ret >= 0) { iocb->ki_pos += ret; count -= ret; } iov_iter_revert(to, count - iov_iter_count(to)); if (ret < 0 || !count) goto reexpand;