Hi Noah, On Thu, May 25, 2023 at 8:04 PM tip-bot2 for Noah Goldstein <tip-bot2@xxxxxxxxxxxxx> wrote: > The following commit has been merged into the x86/misc branch of tip: > > Commit-ID: 688eb8191b475db5acfd48634600b04fd3dda9ad > Gitweb: https://git.kernel.org/tip/688eb8191b475db5acfd48634600b04fd3dda9ad > Author: Noah Goldstein <goldstein.w.n@xxxxxxxxx> > AuthorDate: Wed, 10 May 2023 20:10:02 -05:00 > Committer: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > CommitterDate: Thu, 25 May 2023 10:55:18 -07:00 > > x86/csum: Improve performance of `csum_partial` > > 1) Add special case for len == 40 as that is the hottest value. The > nets a ~8-9% latency improvement and a ~30% throughput improvement > in the len == 40 case. > > 2) Use multiple accumulators in the 64-byte loop. This dramatically > improves ILP and results in up to a 40% latency/throughput > improvement (better for more iterations). > > Results from benchmarking on Icelake. Times measured with rdtsc() > len lat_new lat_old r tput_new tput_old r > 8 3.58 3.47 1.032 3.58 3.51 1.021 > 16 4.14 4.02 1.028 3.96 3.78 1.046 > 24 4.99 5.03 0.992 4.23 4.03 1.050 > 32 5.09 5.08 1.001 4.68 4.47 1.048 > 40 5.57 6.08 0.916 3.05 4.43 0.690 > 48 6.65 6.63 1.003 4.97 4.69 1.059 > 56 7.74 7.72 1.003 5.22 4.95 1.055 > 64 6.65 7.22 0.921 6.38 6.42 0.994 > 96 9.43 9.96 0.946 7.46 7.54 0.990 > 128 9.39 12.15 0.773 8.90 8.79 1.012 > 200 12.65 18.08 0.699 11.63 11.60 1.002 > 272 15.82 23.37 0.677 14.43 14.35 1.005 > 440 24.12 36.43 0.662 21.57 22.69 0.951 > 952 46.20 74.01 0.624 42.98 53.12 0.809 > 1024 47.12 78.24 0.602 46.36 58.83 0.788 > 1552 72.01 117.30 0.614 71.92 96.78 0.743 > 2048 93.07 153.25 0.607 93.28 137.20 0.680 > 2600 114.73 194.30 0.590 114.28 179.32 0.637 > 3608 156.34 268.41 0.582 154.97 254.02 0.610 > 4096 175.01 304.03 0.576 175.89 292.08 0.602 > > There is no such thing as a free lunch, however, and the special case > for len == 40 does add overhead to the len != 40 cases. This seems to > amount to be ~5% throughput and slightly less in terms of latency. > > Testing: > Part of this change is a new kunit test. The tests check all > alignment X length pairs in [0, 64) X [0, 512). > There are three cases. > 1) Precomputed random inputs/seed. The expected results where > generated use the generic implementation (which is assumed to be > non-buggy). > 2) An input of all 1s. The goal of this test is to catch any case > a carry is missing. > 3) An input that never carries. The goal of this test si to catch > any case of incorrectly carrying. > > More exhaustive tests that test all alignment X length pairs in > [0, 8192) X [0, 8192] on random data are also available here: > https://github.com/goldsteinn/csum-reproduction > > The reposity also has the code for reproducing the above benchmark > numbers. > > Signed-off-by: Noah Goldstein <goldstein.w.n@xxxxxxxxx> > Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Thanks for your patch, which is now commit 688eb8191b475db5 ("x86/csum: Improve performance of `csum_partial`") in linus/master stable/master > Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.com This does not seem to be a message sent to a public mailing list archived at lore (yet). On m68k (ARAnyM): KTAP version 1 # Subtest: checksum 1..3 # test_csum_fixed_random_inputs: ASSERTION FAILED at lib/checksum_kunit.c:243 Expected result == expec, but result == 54991 (0xd6cf) expec == 33316 (0x8224) not ok 1 test_csum_fixed_random_inputs # test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267 Expected result == expec, but result == 255 (0xff) expec == 65280 (0xff00) Endianness issue in the test? not ok 2 test_csum_all_carry_inputs # test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:306 Expected result == expec, but result == 64515 (0xfc03) expec == 0 (0x0) not ok 3 test_csum_no_carry_inputs # checksum: pass:0 fail:3 skip:0 total:3 # Totals: pass:0 fail:3 skip:0 total:3 not ok 1 checksum Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds