On 9/14/21 12:45 PM, Linus Torvalds wrote: > On Tue, Sep 14, 2021 at 7:18 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> >> + iov_iter_restore(iter, state); >> + > ... >> rw->bytes_done += ret; >> + iov_iter_advance(iter, ret); >> + if (!iov_iter_count(iter)) >> + break; >> + iov_iter_save_state(iter, state); > > Ok, so now you keep iovb_iter and the state always in sync by just > always resetting the iter back and then walking it forward explicitly > - and re-saving the state. > > That seems safe, if potentially unnecessarily expensive. Right, it's not ideal if it's a big range of IO, then it'll definitely be noticeable. But not too worried about it, at least not for now... > I guess re-walking lots of iovec entries is actually very unlikely in > practice, so maybe this "stupid brute-force" model is the right one. Not sure what the alternative is here. We could do something similar to __io_import_fixed() as we're only dealing with iter types where we can do that, but probably best left as a later optimization if it's deemed necessary. > I do find the odd "use __state vs rw->state" to be very confusing, > though. Particularly in io_read(), where you do this: > > + iov_iter_restore(iter, state); > + > ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true); > if (ret2) > return ret2; > > iovec = NULL; > rw = req->async_data; > - /* now use our persistent iterator, if we aren't already */ > - iter = &rw->iter; > + /* now use our persistent iterator and state, if we aren't already */ > + if (iter != &rw->iter) { > + iter = &rw->iter; > + state = &rw->iter_state; > + } > > do { > - io_size -= ret; > rw->bytes_done += ret; > + iov_iter_advance(iter, ret); > + if (!iov_iter_count(iter)) > + break; > + iov_iter_save_state(iter, state); > > > Note how it first does that iov_iter_restore() on iter/state, buit > then it *replaces&* the iter/state pointers, and then it does > iov_iter_advance() on the replacement ones. We restore the iter so it's the same as before we did the read_iter call, and then setup a consistent copy of the iov/iter in case we need to punt this request for retry. rw->iter should have the same state as iter at this point, and since rw->iter is the copy we'll use going forward, we're advancing that one in case ret > 0. The other case is that no persistent state is needed, and then iter remains the same. I'll take a second look at this part and see if I can make it a bit more straight forward, or at least comment it properly. -- Jens Axboe