On 9/13/21 5:23 PM, Linus Torvalds wrote: > On Mon, Sep 13, 2021 at 3:43 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> Al, Linus, are you OK with this? I think we should get this in for 5.15. >> I didn't resend the whole series, just a v2 of patch 1/3 to fix that bvec >> vs iovec issue. Let me know if you want the while thing resent. > > So I'm ok with the iov_iter side, but the io_uring side seems still > positively buggy, and very confused. > > It also messes with the state in bad ways and has internal knowledge. > And some of it looks completely bogus. > > For example, I see > > state->count -= ret; > rw->bytes_done += ret; > > and I go "that's BS". There's no way it's ok to start messing with the > byte count inside the state like that. That just means that the state > is now no longer the saved state, and it's some random garbage. > > I also think that the "bytes_done += ret" is a big hint there: any > time you restore the iovec state, you should then forward it by > "bytes_done". But that's not what the code does. > > Instead, it will now restore the iovec styate with the *wrong* number > of bytes remaining, but will start from the beginning of the iovec. > > So I think the fs/io_uring.c use of this state buffer is completely wrong. > > What *may* be the right thing to do is to > > (a) not mess with state->count > > (b) when you restore the state you always use > > iov_iter_restore(iter, state, bytes_done); > > to actually restore the *correct* state. > > Because modifying the iovec save state like that cannot be right, and > if it's right it's still too ugly and fragile for words. That save > state should be treated as a snapshot, not as a random buffer that you > can make arbitrary changes to. > > See what I'm saying? OK, for the do while loop itself, I do think we should be more consistent and that would also get rid of the state->count manipulation. I do agree that messing with that state is not something that should be done, and we can do this more cleanly and consistently instead. Once we hit the do {} while loop, state should be &rw->state and we can consistently handle it that way. Let me rework that bit and run the tests, and I'll post a v2 tomorrow. Thanks for taking a closer look. -- Jens Axboe