Re: [PATCHSET v6b 0/11] Turn single segment imports into ITER_UBUF

Jens Axboe <axboe@xxxxxxxxx> · Sun, 2 Apr 2023 16:22:05 -0600

On 3/30/23 4:18 PM, Jens Axboe wrote:
> On 3/30/23 3:53 PM, Linus Torvalds wrote:
>> On Thu, Mar 30, 2023 at 10:33 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>
>>> That said, there might be things to improve here. But that's a task
>>> for another time.
>>
>> So I ended up looking at this, and funnily enough, the *compat*
>> version of the "copy iovec from user" is actually written to be a lot
>> more efficient than the "native" version.
>>
>> The reason is that the compat version has to load the data one field
>> at a time anyway to do the conversion, so it open-codes the loop. And
>> it does it all using the efficient "user_access_begin()" etc, so it
>> generates good code.
>>
>> In contrast, the native version just does a "copy_from_user()" and
>> then loops over the result to verify it. And that's actually pretty
>> horrid. Doing the open-coded loop that fetches and verifies the iov
>> entries one at a time should be much better.
>>
>> I dunno. That's my gut feel, at least. And it may explain why your
>> "readv()" benchmark has "_copy_from_user()" much higher up than the
>> "read()" case.
>>
>> Something like the attached *may* help.
>>
>> Untested - I only checked the generated assembly to see that it seems
>> to be sane, but I might have done something stupid. I basically copied
>> the compat code, fixed it up for non-compat types, and then massaged
>> it a bit more.
> 
> That's a nice improvement - about 6% better for the single vec case,
> And that's the full "benchmark". Here are the numbers in usec for
> the read-zero. Lower is better, obviously.

Linus, are you going to turn this into a proper patch? This is too
good to not pursue.

-- 
Jens Axboe