On Sat, May 28, 2022 at 9:57 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > * Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > > > * Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > > > > On Mon, May 23, 2022 at 10:03:45AM -0600, Jens Axboe wrote: > > > > clear_user() > > > > 32 ~96MB/sec > > > > 64 195MB/sec > > > > 128 386MB/sec > > > > 1k 2.7GB/sec > > > > 4k 7.8GB/sec > > > > 16k 14.8GB/sec > > > > > > > > copy_from_zero_page() > > > > 32 ~96MB/sec > > > > 64 193MB/sec > > > > 128 383MB/sec > > > > 1k 2.9GB/sec > > > > 4k 9.8GB/sec > > > > 16k 21.8GB/sec > > > > > > Just FYI, on x86, Samuel Neves proposed some nice clear_user() > > > performance improvements that were forgotten about: > > > > > > https://lore.kernel.org/lkml/20210523180423.108087-1-sneves@xxxxxxxxx/ > > > https://lore.kernel.org/lkml/Yk9yBcj78mpXOOLL@xxxxxxxxx/ > > > > > > Hoping somebody picks this up at some point... > > > > Those ~2x speedup numbers are indeed looking very nice: > > > > | After this patch, on a Skylake CPU, these are the > > | before/after figures: > > | > > | $ dd if=/dev/zero of=/dev/null bs=1024k status=progress > > | 94402248704 bytes (94 GB, 88 GiB) copied, 6 s, 15.7 GB/s > > | > > | $ dd if=/dev/zero of=/dev/null bs=1024k status=progress > > | 446476320768 bytes (446 GB, 416 GiB) copied, 15 s, 29.8 GB/s > > > > Patch fell through the cracks & it doesn't apply anymore: > > > > checking file arch/x86/lib/usercopy_64.c > > Hunk #2 FAILED at 17. > > 1 out of 2 hunks FAILED > > > > Would be nice to re-send it. > > Turns out Boris just sent a competing optimization to clear_user() 3 days ago: > > https://lore.kernel.org/r/YozQZMyQ0NDdD8cH@xxxxxxx > > Thanks, > [ CC Hugh ] I hope I adapted both patches from Hugh and Samuel against Linux v5.18 correctly. As I have no "modern CPU" meaning Intel Sandy-Bridge, the patch of Hugh was not predestined for me (see numbers). Samuel's patch gave me 15% of speedup with running Hugh's dd test-case (cannot say if this is a real benchmark for testing). Patches and latest linux-config attached. *** Without patch root# cat /proc/version Linux version 5.18.0-3-amd64-clang14-lto (sedat.dilek@xxxxxxxxx@iniza) (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git 29f1039a7285a5c3a9c353d05 4140bf2556d4c4d), LLD 14.0.4) #3~bookworm+dileks1 SMP PREEMPT_DYNAMIC 2022-05-27 root# dd if=/dev/zero of=/dev/null bs=1M count=1M 1048576+0 Datensätze ein 1048576+0 Datensätze aus 1099511627776 Bytes (1,1 TB, 1,0 TiB) kopiert, 97,18 s, 11,3 GB/s *** With hughd patch Patch: 0001-x86-usercopy-Use-alternatives-for-clear_user.patch Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@xxxxxxxxxx/ root# cat /proc/version Linux version 5.18.0-4-amd64-clang14-lto (sedat.dilek@xxxxxxxxx@iniza) (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git 29f1039a7285a5c3a9c35> root# dd if=/dev/zero of=/dev/null bs=1M count=1M 1048576+0 Datensätze ein 1048576+0 Datensätze aus 1099511627776 Bytes (1,1 TB, 1,0 TiB) kopiert, 588,053 s, 1,9 GB/s root# cat /proc/version Linux version 5.18.0-4-amd64-clang14-lto (sedat.dilek@xxxxxxxxx@iniza) (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git 29f1039a7285a5c3a9c353d05 4140bf2556d4c4d), LLD 14.0.4) #4~bookworm+dileks1 SMP PREEMPT_DYNAMIC 2022-05-28 *** With sneves patch Patch: 0001-x86-usercopy-speed-up-64-bit-__clear_user-with-stos-.patch Link: https://lore.kernel.org/lkml/20210523180423.108087-1-sneves@xxxxxxxxx/ root# cat /proc/version Linux version 5.18.0-5-amd64-clang14-lto (sedat.dilek@xxxxxxxxx@iniza) (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git 29f1039a7285a5c3a9c353d05 4140bf2556d4c4d), LLD 14.0.4) #5~bookworm+dileks1 SMP PREEMPT_DYNAMIC 2022-05-28 root# dd if=/dev/zero of=/dev/null bs=1M count=1M 1048576+0 Datensätze ein 1048576+0 Datensätze aus 1099511627776 Bytes (1,1 TB, 1,0 TiB) kopiert, 82,697 s, 13,3 GB/s -dileks // 28-May-2022