Hi, i have a performance problem with aes-xxx-cbc in evp mode on some cpus. Drop from 70MB/s to 30MB/s. It seems that the vpaes implemention is not good for all cpus that support ssse3. (I know that it speed up a lot on many Intel cpu's) Tested cpu's that have the problem: AMD E-?350 AMD E2-?1800 AMD A4-?5000 (only noticeable when disabling AES-?NI) AMD FX8150 (only noticeable when disabling AES-?NI) Intel Celeron J1900 Inter Celeron N2930 I will add some output with older OpenSSL from a Linux-Mint system but it is the same with current 1.0.2a on IPFire build. Any Ideas to solve this without disabling vpaes for all cpu's. I already have a patch to disable it for Amd because i have not found any Amd that are faster with vpaes, but for Intel Core2 it brings a lot of speed. http://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=src/patches/openssl-1.0.2a_disable_ssse3_for_amd.patch;h=097cc80713ffc592dfe708ba9155591407c34c14;hb=0e2f9b011b8945dbfdfd3cac9fe1a486c48732e1 Regards, Arne Fitzenreiter Maintainer IPFire 2.x -?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-? arne at hp-e2 ~ $ cat /?proc/?cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 20 model : 2 model name : AMD E2-?1800 APU with Radeon(tm) HD Graphics stepping : 0 microcode : 0x500010d cpu MHz : 850.000 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter bogomips : 3393.76 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate -?-?-?-? other 4 cores removed -?-?-?-? For reference without -?evp hp-?e2 ~ # openssl speed aes-?256-?cbc Doing aes-256 cbc for 3s on 16 size blocks: 4735277 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 64 size blocks: 1244427 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 256 size blocks: 316282 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 1024 size blocks: 209266 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 8192 size blocks: 26337 aes-256 cbc's in 2.99s OpenSSL 1.0.1f 6 Jan 2014 built on: Thu Mar 19 15:12:02 UTC 2015 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 25254.81k 26636.56k 27079.66k 71668.36k 72158.09k now with -?evp hp-?e2 ~ # openssl speed -?evp aes-?256-?cbc Doing aes-256-cbc for 3s on 16 size blocks: 4915660 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 64 size blocks: 1278970 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 256 size blocks: 324633 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 1024 size blocks: 81472 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 8192 size blocks: 10196 aes-256-cbc's in 2.98s OpenSSL 1.0.1f 6 Jan 2014 built on: Thu Mar 19 15:12:02 UTC 2015 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 26392.81k 27467.81k 27794.66k 27995.75k 28028.74k now with hided ssse3 so it has to fallback... hp-?e2 ~ # OPENSSL_ia32cap=~0x20000000000 openssl speed -?evp aes-?256-?cbc Doing aes-256-cbc for 3s on 16 size blocks: 4594852 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 64 size blocks: 1232170 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 256 size blocks: 314750 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 1024 size blocks: 207284 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 8192 size blocks: 26242 aes-256-cbc's in 2.99s OpenSSL 1.0.1f 6 Jan 2014 built on: Thu Mar 19 15:12:02 UTC 2015 options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 24587.84k 26462.71k 26948.49k 71227.79k 71897.81k hp-?e2 ~ #