Quoting Borislav Petkov <bp@xxxxxxxxx>:
On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
About ~5% slower, probably because I was tuning for sandy-bridge and
introduced more FPU<=>CPU register moves.
Here's new version of patch, with FPU<=>CPU moves from original
implementation.
(Note: also changes encryption function to inline all code in to main
function, decryption still places common code to separate function to
reduce object size. This is to measure the difference.)
Yep, looks better than the previous run and also a bit better or on par
with the initial run I did.
Thanks again. Speed gained with patch is ~8%, and is able of getting
twofish-avx pass twofish-3way.
The thing is, I'm not sure whether optimizing the thing for each uarch
is a workable solution software-wise or maybe having a single version
which performs sufficiently ok on all uarches is easier/better to
maintain without causing code bloat. Hmmm...
Agreed, testing on multiple CPUs to get single well working version is
what I have done in the past. But purchasing all the latest CPUs on
the market isn't option for me, and for testing AVX I'm stuck with
sandy-bridge :)
-Jussi
4th:
====
ran like 1st.
[ 1014.074150]
[ 1014.074150] testing speed of async ecb(twofish) encryption
[ 1014.083829] test 0 (128 bit key, 16 byte blocks): 4870055
operations in 1 seconds (77920880 bytes)
[ 1015.092757] test 1 (128 bit key, 64 byte blocks): 2043828
operations in 1 seconds (130804992 bytes)
[ 1016.099441] test 2 (128 bit key, 256 byte blocks): 606400
operations in 1 seconds (155238400 bytes)
[ 1017.105939] test 3 (128 bit key, 1024 byte blocks): 168939
operations in 1 seconds (172993536 bytes)
[ 1018.112517] test 4 (128 bit key, 8192 byte blocks): 21777
operations in 1 seconds (178397184 bytes)
[ 1019.119035] test 5 (192 bit key, 16 byte blocks): 4882254
operations in 1 seconds (78116064 bytes)
[ 1020.125716] test 6 (192 bit key, 64 byte blocks): 2043230
operations in 1 seconds (130766720 bytes)
[ 1021.132391] test 7 (192 bit key, 256 byte blocks): 607477
operations in 1 seconds (155514112 bytes)
[ 1022.138889] test 8 (192 bit key, 1024 byte blocks): 168743
operations in 1 seconds (172792832 bytes)
[ 1023.145476] test 9 (192 bit key, 8192 byte blocks): 21442
operations in 1 seconds (175652864 bytes)
[ 1024.152012] test 10 (256 bit key, 16 byte blocks): 4891863
operations in 1 seconds (78269808 bytes)
[ 1025.158684] test 11 (256 bit key, 64 byte blocks): 2049390
operations in 1 seconds (131160960 bytes)
[ 1026.165366] test 12 (256 bit key, 256 byte blocks): 606847
operations in 1 seconds (155352832 bytes)
[ 1027.171841] test 13 (256 bit key, 1024 byte blocks): 169228
operations in 1 seconds (173289472 bytes)
[ 1028.178436] test 14 (256 bit key, 8192 byte blocks): 21773
operations in 1 seconds (178364416 bytes)
[ 1029.184981]
[ 1029.184981] testing speed of async ecb(twofish) decryption
[ 1029.194508] test 0 (128 bit key, 16 byte blocks): 4931065
operations in 1 seconds (78897040 bytes)
[ 1030.199640] test 1 (128 bit key, 64 byte blocks): 2056931
operations in 1 seconds (131643584 bytes)
[ 1031.206303] test 2 (128 bit key, 256 byte blocks): 589409
operations in 1 seconds (150888704 bytes)
[ 1032.212832] test 3 (128 bit key, 1024 byte blocks): 163681
operations in 1 seconds (167609344 bytes)
[ 1033.219443] test 4 (128 bit key, 8192 byte blocks): 21062
operations in 1 seconds (172539904 bytes)
[ 1034.225979] test 5 (192 bit key, 16 byte blocks): 4931537
operations in 1 seconds (78904592 bytes)
[ 1035.232608] test 6 (192 bit key, 64 byte blocks): 2053989
operations in 1 seconds (131455296 bytes)
[ 1036.239289] test 7 (192 bit key, 256 byte blocks): 589591
operations in 1 seconds (150935296 bytes)
[ 1037.241784] test 8 (192 bit key, 1024 byte blocks): 163565
operations in 1 seconds (167490560 bytes)
[ 1038.244387] test 9 (192 bit key, 8192 byte blocks): 20899
operations in 1 seconds (171204608 bytes)
[ 1039.250923] test 10 (256 bit key, 16 byte blocks): 4937343
operations in 1 seconds (78997488 bytes)
[ 1040.257589] test 11 (256 bit key, 64 byte blocks): 2050678
operations in 1 seconds (131243392 bytes)
[ 1041.264262] test 12 (256 bit key, 256 byte blocks): 586869
operations in 1 seconds (150238464 bytes)
[ 1042.270753] test 13 (256 bit key, 1024 byte blocks): 163548
operations in 1 seconds (167473152 bytes)
[ 1043.277365] test 14 (256 bit key, 8192 byte blocks): 21053
operations in 1 seconds (172466176 bytes)
[ 1044.283892]
[ 1044.283892] testing speed of async cbc(twofish) encryption
[ 1044.293349] test 0 (128 bit key, 16 byte blocks): 5186240
operations in 1 seconds (82979840 bytes)
[ 1045.298534] test 1 (128 bit key, 64 byte blocks): 1921034
operations in 1 seconds (122946176 bytes)
[ 1046.305207] test 2 (128 bit key, 256 byte blocks): 542787
operations in 1 seconds (138953472 bytes)
[ 1047.311699] test 3 (128 bit key, 1024 byte blocks): 141399
operations in 1 seconds (144792576 bytes)
[ 1048.318312] test 4 (128 bit key, 8192 byte blocks): 17755
operations in 1 seconds (145448960 bytes)
[ 1049.324829] test 5 (192 bit key, 16 byte blocks): 5196441
operations in 1 seconds (83143056 bytes)
[ 1050.331485] test 6 (192 bit key, 64 byte blocks): 1921456
operations in 1 seconds (122973184 bytes)
[ 1051.338157] test 7 (192 bit key, 256 byte blocks): 543581
operations in 1 seconds (139156736 bytes)
[ 1052.344658] test 8 (192 bit key, 1024 byte blocks): 141473
operations in 1 seconds (144868352 bytes)
[ 1053.351270] test 9 (192 bit key, 8192 byte blocks): 17601
operations in 1 seconds (144187392 bytes)
[ 1054.357823] test 10 (256 bit key, 16 byte blocks): 5190283
operations in 1 seconds (83044528 bytes)
[ 1055.364462] test 11 (256 bit key, 64 byte blocks): 1912796
operations in 1 seconds (122418944 bytes)
[ 1056.371134] test 12 (256 bit key, 256 byte blocks): 542719
operations in 1 seconds (138936064 bytes)
[ 1057.377643] test 13 (256 bit key, 1024 byte blocks): 141377
operations in 1 seconds (144770048 bytes)
[ 1058.384229] test 14 (256 bit key, 8192 byte blocks): 17752
operations in 1 seconds (145424384 bytes)
[ 1059.390799]
[ 1059.390799] testing speed of async cbc(twofish) decryption
[ 1059.400187] test 0 (128 bit key, 16 byte blocks): 4889197
operations in 1 seconds (78227152 bytes)
[ 1060.405460] test 1 (128 bit key, 64 byte blocks): 1980831
operations in 1 seconds (126773184 bytes)
[ 1061.408145] test 2 (128 bit key, 256 byte blocks): 568695
operations in 1 seconds (145585920 bytes)
[ 1062.410647] test 3 (128 bit key, 1024 byte blocks): 158294
operations in 1 seconds (162093056 bytes)
[ 1063.417258] test 4 (128 bit key, 8192 byte blocks): 20312
operations in 1 seconds (166395904 bytes)
[ 1064.423758] test 5 (192 bit key, 16 byte blocks): 4904906
operations in 1 seconds (78478496 bytes)
[ 1065.430440] test 6 (192 bit key, 64 byte blocks): 1983636
operations in 1 seconds (126952704 bytes)
[ 1066.437104] test 7 (192 bit key, 256 byte blocks): 564340
operations in 1 seconds (144471040 bytes)
[ 1067.443613] test 8 (192 bit key, 1024 byte blocks): 157404
operations in 1 seconds (161181696 bytes)
[ 1068.450216] test 9 (192 bit key, 8192 byte blocks): 20055
operations in 1 seconds (164290560 bytes)
[ 1069.456753] test 10 (256 bit key, 16 byte blocks): 4901215
operations in 1 seconds (78419440 bytes)
[ 1070.463417] test 11 (256 bit key, 64 byte blocks): 1978968
operations in 1 seconds (126653952 bytes)
[ 1071.470073] test 12 (256 bit key, 256 byte blocks): 568440
operations in 1 seconds (145520640 bytes)
[ 1072.476580] test 13 (256 bit key, 1024 byte blocks): 158329
operations in 1 seconds (162128896 bytes)
[ 1073.483177] test 14 (256 bit key, 8192 byte blocks): 20311
operations in 1 seconds (166387712 bytes)
[ 1074.489739]
[ 1074.489739] testing speed of async ctr(twofish) encryption
[ 1074.499266] test 0 (128 bit key, 16 byte blocks): 4565109
operations in 1 seconds (73041744 bytes)
[ 1075.504391] test 1 (128 bit key, 64 byte blocks): 1955085
operations in 1 seconds (125125440 bytes)
[ 1076.511055] test 2 (128 bit key, 256 byte blocks): 573971
operations in 1 seconds (146936576 bytes)
[ 1077.517563] test 3 (128 bit key, 1024 byte blocks): 158489
operations in 1 seconds (162292736 bytes)
[ 1078.524175] test 4 (128 bit key, 8192 byte blocks): 20330
operations in 1 seconds (166543360 bytes)
[ 1079.530702] test 5 (192 bit key, 16 byte blocks): 4550468
operations in 1 seconds (72807488 bytes)
[ 1080.537358] test 6 (192 bit key, 64 byte blocks): 1943897
operations in 1 seconds (124409408 bytes)
[ 1081.544030] test 7 (192 bit key, 256 byte blocks): 564033
operations in 1 seconds (144392448 bytes)
[ 1082.550531] test 8 (192 bit key, 1024 byte blocks): 157126
operations in 1 seconds (160897024 bytes)
[ 1083.557170] test 9 (192 bit key, 8192 byte blocks): 20121
operations in 1 seconds (164831232 bytes)
[ 1084.563713] test 10 (256 bit key, 16 byte blocks): 4403637
operations in 1 seconds (70458192 bytes)
[ 1085.570360] test 11 (256 bit key, 64 byte blocks): 1961264
operations in 1 seconds (125520896 bytes)
[ 1086.577008] test 12 (256 bit key, 256 byte blocks): 571514
operations in 1 seconds (146307584 bytes)
[ 1087.583517] test 13 (256 bit key, 1024 byte blocks): 158342
operations in 1 seconds (162142208 bytes)
[ 1088.590121] test 14 (256 bit key, 8192 byte blocks): 20392
operations in 1 seconds (167051264 bytes)
[ 1089.596648]
[ 1089.596648] testing speed of async ctr(twofish) decryption
[ 1089.606061] test 0 (128 bit key, 16 byte blocks): 4517104
operations in 1 seconds (72273664 bytes)
[ 1090.611326] test 1 (128 bit key, 64 byte blocks): 1953102
operations in 1 seconds (124998528 bytes)
[ 1091.617989] test 2 (128 bit key, 256 byte blocks): 574354
operations in 1 seconds (147034624 bytes)
[ 1092.624497] test 3 (128 bit key, 1024 byte blocks): 158402
operations in 1 seconds (162203648 bytes)
[ 1093.631110] test 4 (128 bit key, 8192 byte blocks): 20369
operations in 1 seconds (166862848 bytes)
[ 1094.637618] test 5 (192 bit key, 16 byte blocks): 4524710
operations in 1 seconds (72395360 bytes)
[ 1095.644293] test 6 (192 bit key, 64 byte blocks): 1940148
operations in 1 seconds (124169472 bytes)
[ 1096.650957] test 7 (192 bit key, 256 byte blocks): 567684
operations in 1 seconds (145327104 bytes)
[ 1097.657466] test 8 (192 bit key, 1024 byte blocks): 158922
operations in 1 seconds (162736128 bytes)
[ 1098.664088] test 9 (192 bit key, 8192 byte blocks): 20087
operations in 1 seconds (164552704 bytes)
[ 1099.670596] test 10 (256 bit key, 16 byte blocks): 4397085
operations in 1 seconds (70353360 bytes)
[ 1100.677278] test 11 (256 bit key, 64 byte blocks): 1961007
operations in 1 seconds (125504448 bytes)
[ 1101.683933] test 12 (256 bit key, 256 byte blocks): 577961
operations in 1 seconds (147958016 bytes)
[ 1102.690452] test 13 (256 bit key, 1024 byte blocks): 158836
operations in 1 seconds (162648064 bytes)
[ 1103.697038] test 14 (256 bit key, 8192 byte blocks): 20427
operations in 1 seconds (167337984 bytes)
[ 1104.703575]
[ 1104.703575] testing speed of async lrw(twofish) encryption
[ 1104.713108] test 0 (256 bit key, 16 byte blocks): 3555452
operations in 1 seconds (56887232 bytes)
[ 1105.718261] test 1 (256 bit key, 64 byte blocks): 1617632
operations in 1 seconds (103528448 bytes)
[ 1106.724924] test 2 (256 bit key, 256 byte blocks): 495199
operations in 1 seconds (126770944 bytes)
[ 1107.731442] test 3 (256 bit key, 1024 byte blocks): 137358
operations in 1 seconds (140654592 bytes)
[ 1108.738065] test 4 (256 bit key, 8192 byte blocks): 17637
operations in 1 seconds (144482304 bytes)
[ 1109.740593] test 5 (320 bit key, 16 byte blocks): 3478175
operations in 1 seconds (55650800 bytes)
[ 1110.743248] test 6 (320 bit key, 64 byte blocks): 1591957
operations in 1 seconds (101885248 bytes)
[ 1111.749911] test 7 (320 bit key, 256 byte blocks): 493803
operations in 1 seconds (126413568 bytes)
[ 1112.756430] test 8 (320 bit key, 1024 byte blocks): 137066
operations in 1 seconds (140355584 bytes)
[ 1113.763034] test 9 (320 bit key, 8192 byte blocks): 17288
operations in 1 seconds (141623296 bytes)
[ 1114.769587] test 10 (384 bit key, 16 byte blocks): 3576437
operations in 1 seconds (57222992 bytes)
[ 1115.776232] test 11 (384 bit key, 64 byte blocks): 1587771
operations in 1 seconds (101617344 bytes)
[ 1116.782890] test 12 (384 bit key, 256 byte blocks): 493841
operations in 1 seconds (126423296 bytes)
[ 1117.789396] test 13 (384 bit key, 1024 byte blocks): 137324
operations in 1 seconds (140619776 bytes)
[ 1118.795993] test 14 (384 bit key, 8192 byte blocks): 17625
operations in 1 seconds (144384000 bytes)
[ 1119.802548]
[ 1119.802548] testing speed of async lrw(twofish) decryption
[ 1119.811940] test 0 (256 bit key, 16 byte blocks): 3590161
operations in 1 seconds (57442576 bytes)
[ 1120.817198] test 1 (256 bit key, 64 byte blocks): 1623745
operations in 1 seconds (103919680 bytes)
[ 1121.823872] test 2 (256 bit key, 256 byte blocks): 482001
operations in 1 seconds (123392256 bytes)
[ 1122.830398] test 3 (256 bit key, 1024 byte blocks): 133842
operations in 1 seconds (137054208 bytes)
[ 1123.836992] test 4 (256 bit key, 8192 byte blocks): 17195
operations in 1 seconds (140861440 bytes)
[ 1124.843536] test 5 (320 bit key, 16 byte blocks): 3536998
operations in 1 seconds (56591968 bytes)
[ 1125.850156] test 6 (320 bit key, 64 byte blocks): 1625698
operations in 1 seconds (104044672 bytes)
[ 1126.856830] test 7 (320 bit key, 256 byte blocks): 482518
operations in 1 seconds (123524608 bytes)
[ 1127.863348] test 8 (320 bit key, 1024 byte blocks): 133672
operations in 1 seconds (136880128 bytes)
[ 1128.869959] test 9 (320 bit key, 8192 byte blocks): 16860
operations in 1 seconds (138117120 bytes)
[ 1129.876469] test 10 (384 bit key, 16 byte blocks): 3637750
operations in 1 seconds (58204000 bytes)
[ 1130.883151] test 11 (384 bit key, 64 byte blocks): 1626131
operations in 1 seconds (104072384 bytes)
[ 1131.889814] test 12 (384 bit key, 256 byte blocks): 483999
operations in 1 seconds (123903744 bytes)
[ 1132.896324] test 13 (384 bit key, 1024 byte blocks): 133598
operations in 1 seconds (136804352 bytes)
[ 1133.902920] test 14 (384 bit key, 8192 byte blocks): 17206
operations in 1 seconds (140951552 bytes)
[ 1134.905485]
[ 1134.905485] testing speed of async xts(twofish) encryption
[ 1134.905501] test 0 (256 bit key, 16 byte blocks): 2908165
operations in 1 seconds (46530640 bytes)
[ 1135.908137] test 1 (256 bit key, 64 byte blocks): 1462715
operations in 1 seconds (93613760 bytes)
[ 1136.914715] test 2 (256 bit key, 256 byte blocks): 506478
operations in 1 seconds (129658368 bytes)
[ 1137.921320] test 3 (256 bit key, 1024 byte blocks): 148018
operations in 1 seconds (151570432 bytes)
[ 1138.927924] test 4 (256 bit key, 8192 byte blocks): 19435
operations in 1 seconds (159211520 bytes)
[ 1139.934451] test 5 (384 bit key, 16 byte blocks): 2905195
operations in 1 seconds (46483120 bytes)
[ 1140.941116] test 6 (384 bit key, 64 byte blocks): 1454656
operations in 1 seconds (93097984 bytes)
[ 1141.947683] test 7 (384 bit key, 256 byte blocks): 504479
operations in 1 seconds (129146624 bytes)
[ 1142.954280] test 8 (384 bit key, 1024 byte blocks): 148172
operations in 1 seconds (151728128 bytes)
[ 1143.960892] test 9 (384 bit key, 8192 byte blocks): 19433
operations in 1 seconds (159195136 bytes)
[ 1144.967410] test 10 (512 bit key, 16 byte blocks): 2904583
operations in 1 seconds (46473328 bytes)
[ 1145.974091] test 11 (512 bit key, 64 byte blocks): 1501387
operations in 1 seconds (96088768 bytes)
[ 1146.980652] test 12 (512 bit key, 256 byte blocks): 504501
operations in 1 seconds (129152256 bytes)
[ 1147.987254] test 13 (512 bit key, 1024 byte blocks): 148180
operations in 1 seconds (151736320 bytes)
[ 1148.993842] test 14 (512 bit key, 8192 byte blocks): 19439
operations in 1 seconds (159244288 bytes)
[ 1150.000380]
[ 1150.000380] testing speed of async xts(twofish) decryption
[ 1150.009770] test 0 (256 bit key, 16 byte blocks): 3007004
operations in 1 seconds (48112064 bytes)
[ 1151.015056] test 1 (256 bit key, 64 byte blocks): 1534733
operations in 1 seconds (98222912 bytes)
[ 1152.021642] test 2 (256 bit key, 256 byte blocks): 508129
operations in 1 seconds (130081024 bytes)
[ 1153.028246] test 3 (256 bit key, 1024 byte blocks): 144920
operations in 1 seconds (148398080 bytes)
[ 1154.034859] test 4 (256 bit key, 8192 byte blocks): 18870
operations in 1 seconds (154583040 bytes)
[ 1155.041367] test 5 (384 bit key, 16 byte blocks): 3009083
operations in 1 seconds (48145328 bytes)
[ 1156.048040] test 6 (384 bit key, 64 byte blocks): 1535084
operations in 1 seconds (98245376 bytes)
[ 1157.054609] test 7 (384 bit key, 256 byte blocks): 508112
operations in 1 seconds (130076672 bytes)
[ 1158.061215] test 8 (384 bit key, 1024 byte blocks): 145035
operations in 1 seconds (148515840 bytes)
[ 1159.067830] test 9 (384 bit key, 8192 byte blocks): 18890
operations in 1 seconds (154746880 bytes)
[ 1160.070368] test 10 (512 bit key, 16 byte blocks): 3076988
operations in 1 seconds (49231808 bytes)
[ 1161.073040] test 11 (512 bit key, 64 byte blocks): 1540659
operations in 1 seconds (98602176 bytes)
[ 1162.079610] test 12 (512 bit key, 256 byte blocks): 508316
operations in 1 seconds (130128896 bytes)
[ 1163.086195] test 13 (512 bit key, 1024 byte blocks): 144951
operations in 1 seconds (148429824 bytes)
[ 1164.092792] test 14 (512 bit key, 8192 byte blocks): 18865
operations in 1 seconds (154542080 bytes)
--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html