On Wed, Sep 27, 2000 at 05:16:24PM +0100, Neil Dunbar wrote: > Alexander S A Kjeldaas wrote: > > > > I think there are some interesting issues to be solved when we want to > > get hardware crypto cards running under Linux. For one, we want to > > have a queue of processing requests for the device instead of having a > > synchronous interface like most crypto libraries offer. We also > > probably want to use the CPU if the queue starts to have too many > > entries, or load-balance between several cards, so we need a > > "crypto-provider" concept. > > So you need an abstraction interface. If we're talking > kernel here (ie for IPsec/filesystem crypto/stego), then > all we should need is an abstraction over symmetric key > operations - IKE is done in userspace, after all. I suppose > that it would be possible to leave the slot open for > message digests as well, although I haven't seen a card > which accelerates MD5/SHA-1, or HMAC over them. > The API in the kerneli patch only deals with ciphers and digests. I really haven't thought about adding public-key stuff as I can't think of any applications for it. The only additional interface one might want is for random numbers, and that already esists. The goal of the kerneli project has simply been to provide high-performance ciphers/digests for the kernel that other crypto-projects can build upon. > The only plea that I would make is to not make it too > fancy - otherwise we end up with CDSA and other such > monsters. I agree. However there is a case for having a more higher-level interface than the typical vanilla crypto-API you see in OpenSSL. It is possible to do IDEA+SHA1 _at the same time_ on a normal Pentium II processor in roughly the same time it takes to do just one of them. This is how you do it: You run the IDEA algorithm using the MMX instruction set, and the SHA1 using plain C code, and then merge the two. The reason you can do this "for free" on a modern CPU is that the issue-width is pretty high combined with the fact that most crypto algorithms don't lend themselves to parallel implementations. So when executing a cipher, most CPUs have lots of extra memory bandwidth and instruction-issue bandwidth left unused. Making use of this requires some complexity in how ciphers/digest algorithms are implemented, but I plan on trying out this idea in the future. However since this has never been done in a crypto API AFAIK, it might require a slightly higher-level API. Something along a "transform" that is both a cipher and a digest at the same time. You should tell the API to "encrypt this datastream while calculating SHA1 and doing IP checksumming". Until the kerneli patch has a screaming implementation of the above, I'll let the idea rest, but you have been warned :-). Similarly, in some situations where you want to calculate 3des_cbc of lots of streams at the same time, you might want to be able to switch over to vector processing of lots of packets. Using AltiVec (Power PC) or SSE (Intel) vector instructions you can theoretically have a bitslice implementation of 3des_cbc that can handle 128 streams in parallel. This could be interesting for some IPsec gateways where you can sacrifice some latency for throughput. So keep the API simple unless the numbers tell you otherwise. astor -- Alexander Kjeldaas Mail: astor@xxxxxxx finger astor@xxxxxxxxxxxxxxxxx for OpenPGP key. Linux-crypto: cryptography in and on the Linux system Archive: http://mail.nl.linux.org/linux-crypto/