Hi Russel, Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> writes: > On Sat, Oct 10, 2015 at 12:37:33AM +0200, Arnaud Ebalard wrote: >> Hi Russel, >> >> Russell King <rmk+kernel@xxxxxxxxxxxxxxxx> writes: >> >> > As all the import functions and export functions are virtually >> > identical, factor out their common parts into a generic >> > mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This >> > performs the actual import or export, and we pass the data pointers and >> > length into these functions. >> > >> > We have to switch a % const operation to do_div() in the common import >> > function to avoid provoking gcc to use the expensive 64-bit by 64-bit >> > modulus operation. >> > >> > Signed-off-by: Russell King <rmk+kernel@xxxxxxxxxxxxxxxx> >> >> Thanks for the refactoring and for the fixes. All patches look good to >> me. Out of curiosity, can I ask what perf you get w/ openssh or openssl >> using AF_ALG and the CESA? > > I would do, but it seems this AF_ALG plugin for openssl isn't > actually using it for encryption. When I try: > > openssl speed -engine af_alg aes-128-cbc > > I get results for using openssl's software implementation. If I do: > > openssl speed -engine af_alg md5 > > then I get results from using the kernel's MD5. Hence, I think the > only thing that I think openssh is using it for is the digest stuff, > not the crypto itself. I can't be certain about that though. > > I've tried debugging the af_alg engine plugin, but I'm not getting > very far (I'm not an openssl hacker!) I see it registering the > function to get the ciphers (via ENGINE_set_ciphers), and I see this > called several times, returning a list of NID_xxx values describing > the methods it supports, which includes aes-128-cbc. However, > unlike the equivalent digest function, I never see it called > requesting any of the ciphers. Maybe it's an openssl bug, or a > "feature" preventing hardware crypto? Maybe something is missing > from its initialisation? I've no idea yet. It seems I'm not alone > in this - this report from April 2015 is exactly what I'm seeing: > > https://mta.openssl.org/pipermail/openssl-users/2015-April/001124.html > > However, I'm coming to the conclusion that AF_ALG with openssl is a > dead project, and the only interface that everyone is using for that > is cryptodev - probably contary to Herbert and/or DaveM's wishes. For > example, the openwrt guys seem to only support cryptodev, according to > their wiki page on the subject of hardware crypto: > > http://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators > > Here's the references to code for AF_ALG with openssl I've found so far: > > Original af_alg plugin (dead): > > http://src.carnivore.it/users/common/af_alg/ > > 3rd party "maintained" af_alg openssl plugin, derived from commit > 1851bbb010c38878c83729be844f168192059189 in the above repo but with > no history: > > https://github.com/RidgeRun/af-alg-rr > > and that doesn't contain any changes to the C code originally committed. > Whether this C code contains changes or not is anyone's guess: there's > no way to refer back to the original repository. > > Anyway, here's the digest results: > > Software: > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > md5 13948.89k 42477.61k 104619.41k 165140.82k 199273.13k > sha1 13091.91k 36463.89k 75393.88k 103893.33k 117104.50k > sha256 13573.92k 30492.25k 52700.33k 64247.81k 68722.69k > > Hardware: > The 'numbers' are in 1000s of bytes per second processed. > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > md5 3964.55k 13782.11k 43181.71k 180263.38k 1446616.18k > sha1 4609.16k 8922.35k 35422.87k 333575.31k 2122547.20k > sha256 13519.62k 30484.10k 52547.47k 64285.21k 68530.60k > > There's actually something suspicious while running these tests: > > Doing md5 for 3s on 16 size blocks: 32212 md5's in 0.13s > Doing md5 for 3s on 64 size blocks: 23688 md5's in 0.11s > Doing md5 for 3s on 256 size blocks: 23615 md5's in 0.14s > Doing md5 for 3s on 1024 size blocks: 22885 md5's in 0.13s > Doing md5 for 3s on 8192 size blocks: 15893 md5's in 0.09s > Doing sha1 for 3s on 16 size blocks: 31688 sha1's in 0.11s > Doing sha1 for 3s on 64 size blocks: 23700 sha1's in 0.17s > Doing sha1 for 3s on 256 size blocks: 23523 sha1's in 0.17s > Doing sha1 for 3s on 1024 size blocks: 22803 sha1's in 0.07s > Doing sha1 for 3s on 8192 size blocks: 15546 sha1's in 0.06s > Doing sha256 for 3s on 16 size blocks: 2518030 sha256's in 2.98s > Doing sha256 for 3s on 64 size blocks: 1419416 sha256's in 2.98s > Doing sha256 for 3s on 256 size blocks: 613738 sha256's in 2.99s > Doing sha256 for 3s on 1024 size blocks: 187080 sha256's in 2.98s > Doing sha256 for 3s on 8192 size blocks: 25013 sha256's in 2.99s > > from the hardware - note the "in" figures are rediculously low, yet > they do wait 3s for each test. Also, the sha256 results are close > enough to being the software version. > > No ideas on any of this yet... but I'm not about to start digging in > the openssl code to try and work out what it's up to. As I say, I > think this is AF_ALG with openssl is a dead project. Thanks for the time you took to assemble the information in previous email. Yesterday, when reading your patches, I ended up on [1], where Marek (added him to Cc: list) basically has the same kind of conclusion as yours, i.e. openssl w/ cryptodev is what currently works better even if AF_ALG is the expected target for kernel to provide access to hardware engines to userland apps. I had a lot of performance results at various levels (tcrypt module on variations of the drivers (tasklet, threaded irq, full polling, etc), IPsec tunnel and transport mode through to see how it behaves w/ two mvneta instances also eating CPU cycles for incoming/outgoing packets) but those where done on an encryption use case. Some are provided in [2]. In an early (read dirty) polling-based version of the driver, the CESA on an Armada 370 (mirabox) was verified to be capable of near 100MB/s on buffers of 1500+ bytes for AES CBC encryption. Current version of the driver is not as good (say half that value) but it behaves better. A Mirabox can easily route 1500 bytes packets at 100MB/s between its two interfaces but when you mix both using IPsec in tunnel mode on one side, you end up w/ perfs between 10 to 15MB/s, IIRC. I think it's interesting to see where it ends up w/ the engine exposed to userland consumers (e.g. sth like SSH). I cannot promise a huge amount of time but I'll try and find some to play w/ AF_ALG using openssl and CESA in the coming weeks. Cheers, a+ [1]: http://events.linuxfoundation.org/sites/events/files/slides/lcj-2014-crypto-user.pdf [2]: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/336599.html -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html