Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> This patchset improves performance of the ChaCha20 SIMD implementations
> for x86_64. For some specific encryption lengths, performance is more
> than doubled. Two mechanisms are used to achieve this:
> 
> * Instead of calculating the minimal number of required blocks for a
>   given encryption length, functions producing more blocks are used
>   more aggressively. Calculating a 4-block function can be faster than
>   calculating a 2-block and a 1-block function, even if only three
>   blocks are actually required.
> 
> * In addition to the 8-block AVX2 function, a 4-block and a 2-block
>   function are introduced.
> 
> Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> 8-block functions. Patch 4 makes use of that by engaging the next higher
> level block functions more aggressively. Patch 5 and 6 add the new AVX2
> functions for 2 and 4 blocks. Patches are based on cryptodev and would
> need adjustments to apply on top of the Adiantum patchset.
> 
> Note that the more aggressive use of larger block functions calculate
> blocks that may get discarded. This may have a negative impact on energy
> usage or the processors thermal budget. However, with the new block
> functions we can avoid this over-calculation for many lengths, so the
> performance win can be considered more important.
> 
> Below are performance numbers measured with tcrypt using additional
> encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> existing, new the implementation with this patchset. As comparison
> the numbers for zinc in v6:
> 
>  len  old  new zinc
>    8 5908 5818 5818
>   16 5917 5828 5726
>   24 5916 5869 5757
>   32 5920 5789 5813
>   40 5868 5799 5710
>   48 5877 5761 5761
>   56 5869 5797 5742
>   64 5897 5862 5685
>   72 3381 4979 3520
>   80 3364 5541 3475
>   88 3350 4977 3424
>   96 3342 5530 3371
>  104 3328 4923 3313
>  112 3317 5528 3207
>  120 3313 4970 3150
>  128 3492 5535 3568
>  136 2487 4570 3690
>  144 2481 5047 3599
>  152 2473 4565 3566
>  160 2459 5022 3515
>  168 2461 4550 3437
>  176 2454 5020 3325
>  184 2449 4535 3279
>  192 2538 5011 3762
>  200 1962 4537 3702
>  208 1962 4971 3622
>  216 1954 4487 3518
>  224 1949 4936 3445
>  232 1948 4497 3422
>  240 1941 4947 3317
>  248 1940 4481 3279
>  256 3798 4964 3723
>  264 2638 3577 3639
>  272 2637 3567 3597
>  280 2628 3563 3565
>  288 2630 3795 3484
>  296 2621 3580 3422
>  304 2612 3569 3352
>  312 2602 3599 3308
>  320 2694 3821 3694
>  328 2060 3538 3681
>  336 2054 3565 3599
>  344 2054 3553 3523
>  352 2049 3809 3419
>  360 2045 3575 3403
>  368 2035 3560 3334
>  376 2036 3555 3257
>  384 2092 3785 3715
>  392 1691 3505 3612
>  400 1684 3527 3553
>  408 1686 3527 3496
>  416 1684 3804 3430
>  424 1681 3555 3402
>  432 1675 3559 3311
>  440 1672 3558 3275
>  448 1710 3780 3689
>  456 1431 3541 3618
>  464 1428 3538 3576
>  472 1430 3527 3509
>  480 1426 3788 3405
>  488 1423 3502 3397
>  496 1423 3519 3298
>  504 1418 3519 3277
>  512 3694 3736 3735
>  520 2601 2571 2209
>  528 2601 2677 2148
>  536 2587 2534 2164
>  544 2578 2659 2138
>  552 2570 2552 2126
>  560 2566 2661 2035
>  568 2567 2542 2041
>  576 2639 2674 2199
>  584 2031 2531 2183
>  592 2027 2660 2145
>  600 2016 2513 2155
>  608 2009 2638 2133
>  616 2006 2522 2115
>  624 2000 2649 2064
>  632 1996 2518 2045
>  640 2053 2651 2188
>  648 1666 2402 2182
>  656 1663 2517 2158
>  664 1659 2397 2147
>  672 1657 2510 2139
>  680 1656 2394 2114
>  688 1653 2497 2077
>  696 1646 2393 2043
>  704 1678 2510 2208
>  712 1414 2391 2189
>  720 1412 2506 2169
>  728 1411 2384 2145
>  736 1408 2494 2142
>  744 1408 2379 2081
>  752 1405 2485 2064
>  760 1403 2376 2043
>  768 2189 2498 2211
>  776 1756 2137 2192
>  784 1746 2145 2146
>  792 1744 2141 2141
>  800 1743 2222 2094
>  808 1742 2140 2100
>  816 1735 2134 2061
>  824 1731 2135 2045
>  832 1778 2222 2223
>  840 1480 2132 2184
>  848 1480 2134 2173
>  856 1476 2124 2145
>  864 1474 2210 2126
>  872 1472 2127 2105
>  880 1463 2123 2056
>  888 1468 2123 2043
>  896 1494 2208 2219
>  904 1278 2120 2192
>  912 1277 2121 2170
>  920 1273 2118 2149
>  928 1272 2207 2125
>  936 1267 2125 2098
>  944 1265 2127 2060
>  952 1267 2126 2049
>  960 1289 2213 2204
>  968 1125 2123 2187
>  976 1122 2127 2166
>  984 1120 2123 2136
>  992 1118 2207 2119
> 1000 1118 2120 2101
> 1008 1117 2122 2042
> 1016 1115 2121 2048
> 1024 2174 2191 2195
> 1032 1748 1724 1565
> 1040 1745 1782 1544
> 1048 1736 1737 1554
> 1056 1738 1802 1541
> 1064 1735 1728 1523
> 1072 1730 1780 1507
> 1080 1729 1724 1497
> 1088 1757 1783 1592
> 1096 1475 1723 1575
> 1104 1474 1778 1563
> 1112 1472 1708 1544
> 1120 1468 1774 1521
> 1128 1466 1718 1521
> 1136 1462 1780 1501
> 1144 1460 1719 1491
> 1152 1481 1782 1575
> 1160 1271 1647 1558
> 1168 1271 1706 1554
> 1176 1268 1645 1545
> 1184 1265 1711 1538
> 1192 1265 1648 1530
> 1200 1264 1705 1493
> 1208 1262 1647 1498
> 1216 1277 1695 1581
> 1224 1120 1642 1563
> 1232 1115 1702 1549
> 1240 1121 1646 1538
> 1248 1119 1703 1527
> 1256 1115 1640 1520
> 1264 1114 1693 1505
> 1272 1112 1642 1492
> 1280 1552 1699 1574
> 1288 1314 1525 1573
> 1296 1315 1522 1551
> 1304 1312 1521 1548
> 1312 1311 1564 1535
> 1320 1309 1518 1524
> 1328 1302 1527 1508
> 1336 1303 1521 1500
> 1344 1333 1561 1579
> 1352 1157 1524 1573
> 1360 1152 1520 1546
> 1368 1154 1522 1545
> 1376 1153 1562 1536
> 1384 1151 1525 1526
> 1392 1149 1523 1504
> 1400 1148 1517 1480
> 1408 1167 1561 1589
> 1416 1030 1516 1558
> 1424 1028 1516 1546
> 1432 1027 1522 1537
> 1440 1027 1564 1523
> 1448 1026 1507 1512
> 1456 1025 1515 1491
> 1464 1023 1522 1481
> 1472 1037 1559 1577
> 1480  927 1518 1559
> 1488  926 1514 1548
> 1496  926 1513 1534
> 
> 
> Martin Willi (6):
>   crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3
>     variant
>   crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3
>     variant
>   crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant
>   crypto: x86/chacha20 - Use larger block functions more aggressively
>   crypto: x86/chacha20 - Add a 2-block AVX2 variant
>   crypto: x86/chacha20 - Add a 4-block AVX2 variant
> 
>  arch/x86/crypto/chacha20-avx2-x86_64.S  | 696 ++++++++++++++++++++++--
>  arch/x86/crypto/chacha20-ssse3-x86_64.S | 237 ++++++--
>  arch/x86/crypto/chacha20_glue.c         |  72 ++-
>  3 files changed, 868 insertions(+), 137 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux