On 3/30/23 3:53 PM, Linus Torvalds wrote: > On Thu, Mar 30, 2023 at 10:33 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> That said, there might be things to improve here. But that's a task >> for another time. > > So I ended up looking at this, and funnily enough, the *compat* > version of the "copy iovec from user" is actually written to be a lot > more efficient than the "native" version. > > The reason is that the compat version has to load the data one field > at a time anyway to do the conversion, so it open-codes the loop. And > it does it all using the efficient "user_access_begin()" etc, so it > generates good code. > > In contrast, the native version just does a "copy_from_user()" and > then loops over the result to verify it. And that's actually pretty > horrid. Doing the open-coded loop that fetches and verifies the iov > entries one at a time should be much better. > > I dunno. That's my gut feel, at least. And it may explain why your > "readv()" benchmark has "_copy_from_user()" much higher up than the > "read()" case. > > Something like the attached *may* help. > > Untested - I only checked the generated assembly to see that it seems > to be sane, but I might have done something stupid. I basically copied > the compat code, fixed it up for non-compat types, and then massaged > it a bit more. That's a nice improvement - about 6% better for the single vec case, And that's the full "benchmark". Here are the numbers in usec for the read-zero. Lower is better, obviously. -git 1793883 1809305 1782602 1777280 1803978 1798792 1791190 1802017 1804558 1813370 1807696 1785887 1785506 1789876 1780018 1793932 1803655 1798186 -git+patch 1685393 1685891 1688886 1679967 1687551 1693233 1684883 1688779 1682103 1684944 1686928 1687984 1686729 1687009 1684660 1687295 1684893 1685309 -- Jens Axboe