Re: Hardware crypto

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 27, 2000 at 05:16:24PM +0100, Neil Dunbar wrote:
> Alexander S A Kjeldaas wrote:
> > 
> > I think there are some interesting issues to be solved when we want to
> > get hardware crypto cards running under Linux.  For one, we want to
> > have a queue of processing requests for the device instead of having a
> > synchronous interface like most crypto libraries offer.  We also
> > probably want to use the CPU if the queue starts to have too many
> > entries, or load-balance between several cards, so we need a
> > "crypto-provider" concept.
> 
> So you need an abstraction interface. If we're talking
> kernel here (ie for IPsec/filesystem crypto/stego), then
> all we should need is an abstraction over symmetric key
> operations - IKE is done in userspace, after all. I suppose
> that it would be possible to leave the slot open for
> message digests as well, although I haven't seen a card
> which accelerates MD5/SHA-1, or HMAC over them.
> 

The API in the kerneli patch only deals with ciphers and digests.  I
really haven't thought about adding public-key stuff as I can't think
of any applications for it.  The only additional interface one might
want is for random numbers, and that already esists.  The goal of the
kerneli project has simply been to provide high-performance
ciphers/digests for the kernel that other crypto-projects can build
upon.  

> The only plea that I would make is to not make it too
> fancy - otherwise we end up with CDSA and other such
> monsters.

I agree.  However there is a case for having a more higher-level
interface than the typical vanilla crypto-API you see in OpenSSL.  It
is possible to do IDEA+SHA1 _at the same time_ on a normal Pentium II
processor in roughly the same time it takes to do just one of them.
This is how you do it: You run the IDEA algorithm using the MMX
instruction set, and the SHA1 using plain C code, and then merge the
two.  The reason you can do this "for free" on a modern CPU is that
the issue-width is pretty high combined with the fact that most crypto
algorithms don't lend themselves to parallel implementations.  So when
executing a cipher, most CPUs have lots of extra memory bandwidth and
instruction-issue bandwidth left unused.  Making use of this requires
some complexity in how ciphers/digest algorithms are implemented, but
I plan on trying out this idea in the future.  However since this has
never been done in a crypto API AFAIK, it might require a slightly
higher-level API.  Something along a "transform" that is both a cipher
and a digest at the same time.  You should tell the API to "encrypt
this datastream while calculating SHA1 and doing IP checksumming". 

Until the kerneli patch has a screaming implementation of the above,
I'll let the idea rest, but you have been warned :-).

Similarly, in some situations where you want to calculate 3des_cbc of
lots of streams at the same time, you might want to be able to switch
over to vector processing of lots of packets.  Using AltiVec (Power
PC) or SSE (Intel) vector instructions you can theoretically have a
bitslice implementation of 3des_cbc that can handle 128 streams in
parallel.  This could be interesting for some IPsec gateways where you
can sacrifice some latency for throughput.

So keep the API simple unless the numbers tell you otherwise.

astor

-- 
Alexander Kjeldaas                Mail:  astor@xxxxxxx
finger astor@xxxxxxxxxxxxxxxxx for OpenPGP key.

Linux-crypto:  cryptography in and on the Linux system
Archive:       http://mail.nl.linux.org/linux-crypto/


[Index of Archives]     [Kernel]     [Linux Crypto]     [Gnu Crypto]     [Gnu Classpath]     [Netfilter]     [Bugtraq]
  Powered by Linux