On 9/14/21 1:37 PM, Jens Axboe wrote: > On 9/14/21 12:45 PM, Linus Torvalds wrote: >> On Tue, Sep 14, 2021 at 7:18 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >>> >>> >>> + iov_iter_restore(iter, state); >>> + >> ... >>> rw->bytes_done += ret; >>> + iov_iter_advance(iter, ret); >>> + if (!iov_iter_count(iter)) >>> + break; >>> + iov_iter_save_state(iter, state); >> >> Ok, so now you keep iovb_iter and the state always in sync by just >> always resetting the iter back and then walking it forward explicitly >> - and re-saving the state. >> >> That seems safe, if potentially unnecessarily expensive. > > Right, it's not ideal if it's a big range of IO, then it'll definitely > be noticeable. But not too worried about it, at least not for now... > >> I guess re-walking lots of iovec entries is actually very unlikely in >> practice, so maybe this "stupid brute-force" model is the right one. > > Not sure what the alternative is here. We could do something similar to > __io_import_fixed() as we're only dealing with iter types where we can > do that, but probably best left as a later optimization if it's deemed > necessary. > >> I do find the odd "use __state vs rw->state" to be very confusing, >> though. Particularly in io_read(), where you do this: >> >> + iov_iter_restore(iter, state); >> + >> ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true); >> if (ret2) >> return ret2; >> >> iovec = NULL; >> rw = req->async_data; >> - /* now use our persistent iterator, if we aren't already */ >> - iter = &rw->iter; >> + /* now use our persistent iterator and state, if we aren't already */ >> + if (iter != &rw->iter) { >> + iter = &rw->iter; >> + state = &rw->iter_state; >> + } >> >> do { >> - io_size -= ret; >> rw->bytes_done += ret; >> + iov_iter_advance(iter, ret); >> + if (!iov_iter_count(iter)) >> + break; >> + iov_iter_save_state(iter, state); >> >> >> Note how it first does that iov_iter_restore() on iter/state, buit >> then it *replaces&* the iter/state pointers, and then it does >> iov_iter_advance() on the replacement ones. > > We restore the iter so it's the same as before we did the read_iter > call, and then setup a consistent copy of the iov/iter in case we need > to punt this request for retry. rw->iter should have the same state as > iter at this point, and since rw->iter is the copy we'll use going > forward, we're advancing that one in case ret > 0. > > The other case is that no persistent state is needed, and then iter > remains the same. > > I'll take a second look at this part and see if I can make it a bit more > straight forward, or at least comment it properly. I hacked up something that shortens the iter for the initial IO, so we could more easily test the retry path and the state. It really is a hack, but the idea was to issue 64K io from fio, and then the initial attempt would be anywhere from 4K-60K truncated. That forces retry. I ran this with both 16 segments and 8 segments, verifying that it hits both the UIO_FASTIOV and alloc path. I did find one issue with that, see the last hunk in the hack. We need to increment rw->bytes_done if we don't break, or set ret to 0 if we do. Otherwise that last ret ends up being accounted twice. But apart from that, it passes data verification runs. diff --git a/fs/io_uring.c b/fs/io_uring.c index dc1ff47e3221..484c86252f9d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -744,6 +744,7 @@ enum { REQ_F_NOWAIT_READ_BIT, REQ_F_NOWAIT_WRITE_BIT, REQ_F_ISREG_BIT, + REQ_F_TRUNCATED_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -797,6 +798,7 @@ enum { REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), /* there is a linked timeout that has to be armed */ REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + REQ_F_TRUNCATED = BIT(REQ_F_TRUNCATED_BIT), }; struct async_poll { @@ -3454,11 +3456,12 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct kiocb *kiocb = &req->rw.kiocb; - struct iov_iter __iter, *iter = &__iter; + struct iov_iter __i, __iter, *iter = &__iter; struct io_async_rw *rw = req->async_data; bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK; struct iov_iter_state __state, *state; ssize_t ret, ret2; + bool do_restore = false; if (rw) { iter = &rw->iter; @@ -3492,8 +3495,25 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags) return ret; } + if (!(req->flags & REQ_F_TRUNCATED) && !(iov_iter_count(iter) & 4095)) { + int nr_vecs; + + __i = *iter; + nr_vecs = 1 + (prandom_u32() % iter->nr_segs); + iter->nr_segs = nr_vecs; + iter->count = nr_vecs * 8192; + req->flags |= REQ_F_TRUNCATED; + do_restore = true; + } + ret = io_iter_do_read(req, iter); + if (ret == -EAGAIN) { + req->flags &= ~REQ_F_TRUNCATED; + *iter = __i; + do_restore = false; + } + if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) { req->flags &= ~REQ_F_REISSUE; /* IOPOLL retry should happen for io-wq threads */ @@ -3513,6 +3533,9 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags) iov_iter_restore(iter, state); + if (do_restore) + *iter = __i; + ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true); if (ret2) return ret2; @@ -3526,10 +3549,10 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags) } do { - rw->bytes_done += ret; iov_iter_advance(iter, ret); if (!iov_iter_count(iter)) break; + rw->bytes_done += ret; iov_iter_save_state(iter, state); /* if we can retry, do so with the callbacks armed */ -- Jens Axboe