On Thu, Jun 09, 2022 at 09:10:04PM +0200, Sedat Dilek wrote: > > So Mel gave me the idea to simply measure how fast the function becomes. > > ... > > My SandyBridge-CPU has no FSRM feature, so I'm unsure if I really > benefit from your changes. What does it have to do with FSRM? > My test-cases: > > 1. LC_ALL=C dd if=/dev/zero of=/dev/null bs=1M count=1M status=progress > > 2. perf bench mem memcpy (with Debian's perf v5.18 and a selfmade v5.19-rc1) > > First test-case shows no measurable/noticable differences. No surprise - you hit read() once and write() once per 1Mb worth of clear_user(). If overhead in new_sync_{read,write}() had been _that_ large, the things would've really sucked. > The 2nd one I ran for the first time with your changes and did not > compare with a kernel without them. ???? How could _any_ changes in that series have any impact whatsoever on memcpy() performance? Hell, just look at diffstat - nothing in there goes anywhere near the stuff involved in that test. Nothing whatsoever in arch/x86; no changes in lib/ outside of lib/iov_iter.c, etc. What it does deal with is the overhead of the glue that leads to ->read_iter() and ->write_iter(), as well as overhead of copy_to_iter()/copy_from_iter() that becomes noticable on fairly short reads and writes. It doesn't (and cannot) do anything for the stuff dominated by the time spent in raw_copy_to_user() or raw_copy_from_user() - the code responsible for actual copying data between the kernel and userland memory is completely unaffected by any of that.