On 3/30/23 4:18 PM, Jens Axboe wrote: > On 3/30/23 3:53 PM, Linus Torvalds wrote: >> On Thu, Mar 30, 2023 at 10:33 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >>> >>> That said, there might be things to improve here. But that's a task >>> for another time. >> >> So I ended up looking at this, and funnily enough, the *compat* >> version of the "copy iovec from user" is actually written to be a lot >> more efficient than the "native" version. >> >> The reason is that the compat version has to load the data one field >> at a time anyway to do the conversion, so it open-codes the loop. And >> it does it all using the efficient "user_access_begin()" etc, so it >> generates good code. >> >> In contrast, the native version just does a "copy_from_user()" and >> then loops over the result to verify it. And that's actually pretty >> horrid. Doing the open-coded loop that fetches and verifies the iov >> entries one at a time should be much better. >> >> I dunno. That's my gut feel, at least. And it may explain why your >> "readv()" benchmark has "_copy_from_user()" much higher up than the >> "read()" case. >> >> Something like the attached *may* help. >> >> Untested - I only checked the generated assembly to see that it seems >> to be sane, but I might have done something stupid. I basically copied >> the compat code, fixed it up for non-compat types, and then massaged >> it a bit more. > > That's a nice improvement - about 6% better for the single vec case, > And that's the full "benchmark". Here are the numbers in usec for > the read-zero. Lower is better, obviously. Linus, are you going to turn this into a proper patch? This is too good to not pursue. -- Jens Axboe