In the quest for pushing the limits of chacha20 encryption for both IPsec and Wireguard, this small series adds AVX-512VL block functions. The VL variant works on 256-bit ymm registers, but compared to AVX2 can benefit from the new instructions. Compared to the AVX2 version, these block functions bring an overall speed improvement across encryption lengths of ~20%. Below the tcrypt results for additional block sizes in kOps/s, for the current AVX2 code path, the new AVX-512VL code path and the comparison to Zinc in AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz). These numbers result in a very nice chart, available at: https://download.strongswan.org/misc/chacha-avx-512vl.svg zinc zinc len avx2 512vl avx2 512vl 8 5719 5672 5468 5612 16 5675 5627 5355 5621 24 5687 5601 5322 5633 32 5667 5622 5244 5564 40 5603 5582 5337 5578 48 5638 5539 5400 5556 56 5624 5566 5375 5482 64 5590 5573 5352 5531 72 4841 5467 3365 3457 80 5316 5761 3310 3381 88 4798 5470 3239 3343 96 5324 5723 3197 3281 104 4819 5460 3155 3232 112 5266 5749 3020 3195 120 4776 5391 2959 3145 128 5291 5723 3398 3489 136 4122 4837 3321 3423 144 4507 5057 3247 3389 152 4139 4815 3233 3329 160 4482 5043 3159 3256 168 4142 4766 3131 3224 176 4506 5028 3073 3162 184 4119 4772 3010 3109 192 4499 5016 3402 3502 200 4127 4766 3329 3448 208 4452 5012 3276 3371 216 4128 4744 3243 3334 224 4484 5008 3203 3298 232 4103 4772 3141 3237 240 4458 4963 3115 3217 248 4121 4751 3085 3177 256 4461 4987 3364 4046 264 3406 4282 3270 4006 272 3408 4287 3207 3961 280 3371 4271 3203 3825 288 3625 4301 3129 3751 296 3402 4283 3093 3688 304 3401 4247 3062 3637 312 3382 4282 2995 3614 320 3611 4279 3305 4070 328 3386 4260 3276 3968 336 3369 4288 3171 3929 344 3389 4289 3134 3847 352 3609 4266 3127 3720 360 3355 4252 3076 3692 368 3387 4264 3048 3650 376 3387 4238 2967 3553 384 3568 4265 3277 4035 392 3369 4262 3299 3973 400 3362 4235 3239 3899 408 3352 4269 3196 3843 416 3585 4243 3127 3736 424 3364 4216 3092 3672 432 3341 4246 3067 3628 440 3353 4235 3018 3593 448 3538 4245 3327 4035 456 3322 4244 3275 3900 464 3340 4237 3212 3880 472 3330 4242 3054 3802 480 3530 4234 3078 3707 488 3337 4228 3094 3664 496 3330 4223 3015 3591 504 3317 4214 3002 3517 512 3531 4197 3339 4016 520 2511 3101 2030 2682 528 2627 3087 2027 2641 536 2508 3102 2001 2601 544 2638 3090 1964 2564 552 2494 3077 1962 2516 560 2625 3064 1941 2515 568 2500 3086 1922 2493 576 2611 3074 2050 2689 584 2482 3062 2041 2680 592 2595 3074 2026 2644 600 2470 3060 1985 2595 608 2581 3039 1961 2555 616 2478 3062 1956 2521 624 2587 3066 1930 2493 632 2457 3053 1923 2486 640 2581 3050 2059 2712 648 2296 2839 2024 2655 656 2389 2845 2019 2642 664 2292 2842 2002 2610 672 2404 2838 1959 2537 680 2273 2827 1956 2527 688 2389 2840 1938 2510 696 2280 2837 1911 2463 704 2370 2819 2055 2702 712 2277 2834 2029 2663 720 2369 2829 2020 2625 728 2255 2820 2001 2600 736 2373 2819 1958 2543 744 2269 2827 1956 2524 752 2364 2817 1937 2492 760 2270 2805 1909 2483 768 2378 2820 2050 2696 776 2053 2700 2002 2643 784 2066 2693 1922 2640 792 2065 2703 1928 2602 800 2138 2706 1962 2535 808 2065 2679 1938 2528 816 2063 2699 1929 2500 824 2053 2676 1915 2468 832 2149 2692 2036 2693 840 2055 2689 2024 2659 848 2049 2689 2006 2610 856 2057 2702 1979 2585 864 2144 2703 1960 2547 872 2047 2685 1945 2501 880 2055 2683 1902 2497 888 2060 2689 1897 2478 896 2139 2693 2023 2663 904 2049 2686 1970 2644 912 2055 2688 1925 2621 920 2047 2685 1911 2572 928 2114 2695 1907 2545 936 2055 2681 1927 2492 944 2055 2693 1930 2478 952 2042 2688 1909 2471 960 2136 2682 2014 2672 968 2054 2687 1999 2626 976 2040 2682 1982 2598 984 2055 2687 1943 2569 992 2138 2694 1884 2522 1000 2036 2681 1929 2506 1008 2052 2676 1926 2475 1016 2050 2686 1889 2430 1024 2125 2670 2039 2656 1032 1717 2175 1470 1995 1040 1768 2186 1456 1983 1048 1704 2185 1451 1950 1056 1770 2176 1410 1927 1064 1710 2178 1418 1918 1072 1753 2168 1394 1892 1080 1696 2170 1400 1892 1088 1761 2174 1472 2014 1096 1681 2158 1464 1968 1104 1746 2172 1457 1978 1112 1689 2167 1445 1955 1120 1738 2160 1431 1919 1128 1689 2155 1428 1915 1136 1747 2169 1415 1899 1144 1678 2161 1403 1881 1152 1749 2159 1474 2007 1160 1601 2050 1470 1991 1168 1648 2057 1461 1969 1176 1605 2043 1439 1948 1184 1654 2057 1428 1926 1192 1595 2051 1427 1899 1200 1647 2036 1419 1902 1208 1598 2048 1402 1888 1216 1643 2053 1471 1991 1224 1595 2043 1469 1987 1232 1649 2048 1456 1971 1240 1599 2040 1436 1939 1248 1644 2042 1433 1918 1256 1602 2045 1424 1900 1264 1648 2048 1413 1878 1272 1591 2034 1401 1878 1280 1649 2044 1475 2002 1288 1493 1984 1461 1972 1296 1484 1971 1438 1962 1304 1490 1985 1443 1947 1312 1535 1987 1425 1913 1320 1481 1965 1410 1901 1328 1493 1984 1407 1900 1336 1493 1979 1396 1882 1344 1526 1980 1465 1988 1352 1492 1970 1463 1983 1360 1487 1974 1452 1966 1368 1481 1977 1439 1937 1376 1535 1970 1428 1915 1384 1489 1973 1417 1905 1392 1483 1974 1415 1881 1400 1485 1963 1403 1882 1408 1523 1976 1466 1988 1416 1477 1969 1459 1964 1424 1487 1975 1455 1966 1432 1488 1972 1438 1941 1440 1518 1958 1432 1908 1448 1484 1972 1421 1905 1456 1485 1973 1398 1888 1464 1476 1962 1399 1870 1472 1530 1975 1471 1998 1480 1478 1967 1452 1979 1488 1478 1963 1453 1947 1496 1477 1963 1438 1930 Martin Willi (3): crypto: x86/chacha20 - Add a 8-block AVX-512VL variant crypto: x86/chacha20 - Add a 2-block AVX-512VL variant crypto: x86/chacha20 - Add a 4-block AVX-512VL variant arch/x86/crypto/Makefile | 5 + arch/x86/crypto/chacha20-avx512vl-x86_64.S | 839 +++++++++++++++++++++ arch/x86/crypto/chacha20_glue.c | 40 + 3 files changed, 884 insertions(+) create mode 100644 arch/x86/crypto/chacha20-avx512vl-x86_64.S -- 2.17.1