Re: [git pull] iov_iter fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/9/21 1:37 PM, Linus Torvalds wrote:
> On Wed, Sep 8, 2021 at 9:24 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>
>>         Fixes for io-uring handling of iov_iter reexpands
> 
> Ugh.
> 
> I have pulled this, because I understand what it does and I agree it
> fixes a bug, but it really feels very very hacky and wrong to me.
> 
> It really smells like io-uring is doing a "iov_iter_revert()" using a
> number that it pulls incorrectly out of its arse.
> 
> So when io-uring does that
> 
>                 iov_iter_revert(iter, io_size - iov_iter_count(iter));
> 
> what it *really* wants to do is just basically "iov_iter_reset(iter)".
> 
> And that's basically what that addition of that "iov_iter_reexpand()"
> tries to effectively do.
> 
> Wouldn't it be better to have a function that does exactly that?

That might indeed be better. Alternatively, consumers that truncate
should expand. Part of the problem here is the inconsistency in how they
are consumed.

> Alternatively (and I'm cc'ing Jens) is is not possible for the
> io-uring code to know how many bytes it *actually* used, rather than
> saying that "ok, the iter originally had X bytes, now it has Y bytes,
> so it must have used X-Y bytes" which was actively wrong for the case
> where something ended up truncating the IO for some reason.

Not sure how we'd do that, outside of stupid tricks like copy the
iov_iter before we pass it down. But that's obviously not going to be
very efficient. Hence we're left with having some way to reset/reexpand,
even in the presence of someone having done truncate on it.

> Because I note that io-uring does that
> 
>         /* may have left rw->iter inconsistent on -EIOCBQUEUED */
>         iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));
> 
> in io_resubmit_prep() too, and that you guys missed that it's the
> exact same issue, and needs that exact same iov_iter_reexpand().

I think you're right on that one, there's no difference between that use
case and the other two...

> That "req->result" is once again the *original* length, and the above
> code once again mis-handles the case of "oh, the iov got truncated
> because of some IO limit".
> 
> So I've pulled this, but I think it is
> 
>  (a) ugly nasty
> 
>  (b) incomplete and misses a case
> 
> and needs more thought. At the VERY least it needs that
> iov_iter_reexpand() in io_resubmit_prep() too, I think.
> 
> I'd like the comments expanded too. In particular that
> 
>                 /* some cases will consume bytes even on error returns */

That comment is from me, and it goes back a few years. IIRC, it was the
iomap or xfs code that I hit this with, but honestly I don't remember
all the details at this point. I can try and play with it and see if it
still reproduces.

> really should expand on the "some cases" thing, and why such an error
> isn't fatal buye should be retried asynchronously blindly like this?

That would certainly make it easier to handle, as we'd never need to
care at that point. Ideally, return 'bytes_consumed' or error. It might
have been a case of -EAGAIN after truncate, I'll have to dig a bit to
find it again. Outside of that error, we don't retry as there's no point
in doing so.

> Because I think _that_ is part of the fundamental issue here - the
> io_uring code tries to just blindly re-submit the whole thing, and it
> does it very badly and actually incorrectly.
> 
> Or am I missing something?

I think the key point here is re-figuring out where the
consumption-on-error comes from. If it just ends up being a truncated
iov, that's all good and fine. If not, that feels like a bug somewhere
else that needs fixing.

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux