On 9/9/21 9:24 PM, Al Viro wrote: > On Fri, Sep 10, 2021 at 03:15:35AM +0000, Al Viro wrote: >> On Thu, Sep 09, 2021 at 09:06:58PM -0600, Jens Axboe wrote: >>> On 9/9/21 8:48 PM, Al Viro wrote: >>>> On Thu, Sep 09, 2021 at 07:35:13PM -0600, Jens Axboe wrote: >>>> >>>>> Yep ok I follow you now. And yes, if we get a partial one but one that >>>>> has more consumed than what was returned, that would not work well. I'm >>>>> guessing that a) we've never seen that, or b) we always end up with >>>>> either correctly advanced OR fully advanced, and the fully advanced case >>>>> would then just return 0 next time and we'd just get a short IO back to >>>>> userspace. >>>>> >>>>> The safer way here would likely be to import the iovec again. We're >>>>> still in the context of the original submission, and the sqe hasn't been >>>>> consumed in the ring yet, so that can be done safely. >>>> >>>> ... until you end up with something assuming that you've got the same >>>> iovec from userland the second time around. >>>> >>>> IOW, generally it's a bad idea to do that kind of re-imports. >>> >>> That's really no different than having one thread do the issue, and >>> another modify the iovec while it happens. It's only an issue if you >>> don't validate it, just like you did the first time you imported. No >>> assumptions need to be made here. >> >> It's not "need to be made", it's "will be mistakenly made by >> somebody several years down the road"... > > E.g. somebody blindly assuming that the amount of data read the last > time around will not exceed the size of reimported iov_iter. What I'm > saying is that there's a plenty of ways to fuck up in that direction, > and they will *not* be caught by normal fuzzers. If the plan pans out, it's literally doing the _exact_ same thing that we did originally. No assumptions are made about the contents of the iovecs originally passed in, none of that state is reused. It's an identical import to what was originally done. I'm not saying it's trivial, but as long as the context is correct, then it really should be pretty straight forward... > I'm not arguing in favour of an uncoditional copy, BTW - I would like > to see something resembling profiling data, but it's obviously not a > pretty solution. I can tell you right now that it's unworkable, it'll be a very noticeable slowdown. And it's very much a case of doing the slow path for the extreme corner case of ever hitting this case. For most workloads, you'll _never_ hit it. But we obviously have to be able to do it, for the slower cases (like SCSI with low QD, it'd trigger pretty easily). -- Jens Axboe