On 9/9/21 11:54 PM, Pavel Begunkov wrote: > On 9/9/21 8:37 PM, Linus Torvalds wrote: >> On Wed, Sep 8, 2021 at 9:24 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >>> >>> Fixes for io-uring handling of iov_iter reexpands >> >> Ugh. >> >> I have pulled this, because I understand what it does and I agree it >> fixes a bug, but it really feels very very hacky and wrong to me. > > Maybe was worded not too clearly, my apologies. > > >> It really smells like io-uring is doing a "iov_iter_revert()" using a >> number that it pulls incorrectly out of its arse. > > It's not invented by io_uring, > > filemap.c : generic_file_direct_[write,read]() > > do the same thing. Also, the block layer was not re-expanding before > ~5.12, so it looks it was possible to trigger a similar thing without > io_uring, but I haven't tried to reproduce. Was mentioned in the > cover-letter. > >> So when io-uring does that >> >> iov_iter_revert(iter, io_size - iov_iter_count(iter)); >> >> what it *really* wants to do is just basically "iov_iter_reset(iter)". >> >> And that's basically what that addition of that "iov_iter_reexpand()" >> tries to effectively do. >> >> Wouldn't it be better to have a function that does exactly that? >> >> Alternatively (and I'm cc'ing Jens) is is not possible for the >> io-uring code to know how many bytes it *actually* used, rather than >> saying that "ok, the iter originally had X bytes, now it has Y bytes, >> so it must have used X-Y bytes" which was actively wrong for the case >> where something ended up truncating the IO for some reason. >> >> Because I note that io-uring does that >> >> /* may have left rw->iter inconsistent on -EIOCBQUEUED */ >> iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter)); >> >> in io_resubmit_prep() too, and that you guys missed that it's the >> exact same issue, and needs that exact same iov_iter_reexpand(). > > Right. It was covered by v1-v2, which were failing requests with > additional fallback in v2 [1], but I dropped in v3 [2] because there > is a difference. Namely io_resubmit_prep() might be called deeply down > the stack, e.g. in the block layer. > > It was intended to get fixed once the first part is merged, and I do > believe that was the right approach, because there were certain > communication delays. The first version was posted a month ago, but > we missed the merged window. It appeared to me that if we get anything > more complex Dammit, apologies for the teared email. ... It was intended to get fixed once the first part is merged, and I do believe that was the right approach, because there were certain communication delays. The first version was posted a month ago, but we missed the merged window. It appeared to me that if anything more complex is posted, it would take another window to get it done. > [1] https://lkml.org/lkml/2021/8/12/620 > [2] https://lkml.org/lkml/2021/8/23/285 > >> >> That "req->result" is once again the *original* length, and the above >> code once again mis-handles the case of "oh, the iov got truncated >> because of some IO limit". >> >> So I've pulled this, but I think it is >> >> (a) ugly nasty >> >> (b) incomplete and misses a case >> >> and needs more thought. At the VERY least it needs that >> iov_iter_reexpand() in io_resubmit_prep() too, I think. >> >> I'd like the comments expanded too. In particular that >> >> /* some cases will consume bytes even on error returns */ >> >> really should expand on the "some cases" thing, and why such an error >> isn't fatal buye should be retried asynchronously blindly like this? >> >> Because I think _that_ is part of the fundamental issue here - the >> io_uring code tries to just blindly re-submit the whole thing, and it >> does it very badly and actually incorrectly. >> >> Or am I missing something? >> >> Linus >> > -- Pavel Begunkov