On Thu, 2010-11-04 at 00:38 -0700, Mathias Krause wrote: > On 03.11.2010, 23:27 Huang Ying wrote: > > On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote: > >> The AES-NI instructions are also available in legacy mode so the 32-bit > >> architecture may profit from those, too. > >> > >> To illustrate the performance gain here's a short summary of the tcrypt > >> speed test on a Core i7 M620 running at 2.67GHz comparing both assembler > >> implementations: > >> > >> x86: i568 aes-ni delta > >> 256 bit, 8kB blocks, ECB: 125.94 MB/s 187.09 MB/s +48.6% > > > > Which method do you used for speed testing? > > > > modprobe tcrypt mode=200 sec=<?> > > Yes. I used: modprobe tcrypt mode=200 sec=1 > > > That actually does not work very well for AES-NI. Because AES-NI > > blkcipher is tested in synchronous mode, and in that mode, > > kernel_fpu_begin/end() must be called for every block, and > > kernel_fpu_begin/end() is quite slow. > > That's what I figured, too. Can this slowdown be avoided by saving and > restoring the used FPU registers within the assembler implementation or > would this be even slower? That is a customized version of kernel_fpu_begin/end(), I think the x86 maintainer will not like it. And the benefit may be small too. > > At the same time, some further > > optimization for AES-NI can not be tested (such as "ecb-aes-aesni" > > driver) in that mode, because they are only available in asynchronous > > mode. > > After finding the bug in the second version of the patch I noticed this, > too. > > > When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed > > testing, where AES-NI blkcipher will be tested in asynchronous mode, and > > kernel_fpu_begin/end() is called for every page. Can you use that to > > test? > > But wouldn't this be even slower than the above measurement? I took the > results for 8kB blocks and a page would only be 4kB ... well, depends on > what kind of pages you took. IIRC x86-64 not only supports 2MB but also > 1GB pages ;) There is other difference between them. In synchronous mode kernel_fpu_begin/end() is called for every block, while in asynchronous mode and dm-crypt, kernel_fpu_begin/end() is called for every page. So although the block size is smaller, the result will be better. > > Or you can add test_acipher_speed (similar with test_ahash_speed) to > > test cipher in asynchronous mode. > > Maybe I'll try this approach, since it looks like just a minor > modification of the tcrypt module. Thanks! Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html