-----BEGIN PGP SIGNED MESSAGE----- I have been looking at CryptoAPI with the view towards using the CryptoAPI routines in FreeSWAN 3.X rather than our own. (see http://www.kerneli.org for those on the CC and BCC lists. I know that there are certain other organizations that are supposed to be developing similar APIs, feel free to forward to those organizations) We have several requirements that we must meet: 1) ability to use multiple processors (SMP support) 2) ability to use hardware acceleration 3) ability to seperately account for time spent on crypto vs other networking. Note that the FreeSWAN decrypt code is usually invoked from net_rx_action(), and this will typically be single-threaded. (note that there is nothing in net_rx_action to force this [i.e. a lock], but the network bottom half is kicked from the interrupt handlers to occur on the same CPU as the interrupt, so unless interrupts ping-pong, networking code tends to stick to a single CPU). Encrypt code is typically invoked from net_tx_action(). So, for software implemented ciphers, we want to stick all packets that must be encrypted into a queue and process them in a seperate kernel thread, with a callback at the end. If there is in fact hardware involved, then this just turns into queueing the packet to the hardware, and kicking a crypto_bh to invoke the callback when the completion interrupt occurs. Looking at CryptoAPI is a bit hampered by the fact that I'm not entirely clear how a piece of hardware is supposed to interface. Yes, it needs to provide encrypt/decrypt routines directly rather than relying on the _encrypt/_decrypt simplification. I guess that it should sleep if required inside, except if atomic is set, then one must fall back to using software. I had assumed that this might be necessary when doing, for instance, cryptoswap. Many have talked about lifting this API up and putting a lower-level API underneath. That's what I'm here to do. Looking at the cryptoloop.c file, I don't even see the ATOMIC stuff enabled by default. So, why all the bother? In the case of FreeSWAN with hardware, we do *not* want to sleep. We want a callback. I am looking at doing this. Some more comments: Naming of ciphers ================= I think that there should be a hierarchy of names with longest matching wins. Transforms should provide relative weights for binding to the less specific names. Specifically: aes-cbc aes-cbc/software (or aes/atomic if you prefer) aes-cbc/hardware aes-cbc/hardware/hifn/pci/01/00/0 [PCI bus/device/function] aes-cbc/hardware/chrysalis/pci/04/06/0 aes-cbc/hardware/broadcom/csix/5/7/8 [to make something up] This permits one to VERY specifically attach to the implementation that one wants, while still permitting "aes-cbc" to get something useful. This all would occur at registration and lookup of cipher_implementation time. Hardware vendors ================ Who is left, btw? Chrysalis is out of the cipher chip business, AFAIK. Intel's board has basically gone closed source. Ditto for 3COMs. Neither was a general purpose crypto board, but did un-auditable IPsec. (You never get a chance to see the output packets to confirm that they were in fact encrypted with the right key, that the key didn't leak, etc..) That leaves Broadcom and HiFn that I know of. Are there others? Any of non-US origin? I'm still looking for data sheets on no-NDA,public-domain required data sheets that could be used as a basis for an non-USA origin open source driver. This doesn't have to be for the latest 10Gb/s SPI4.2 CSIX 2 capable product - 100Mb/s half-duplex boards are still useful to get the APIs right. The digest/cipher split ======================= I see that the transform_implementation is subclassed to be digest_implementation and cipher_implementation. I have some problems with this. Many pieces of hardware can do both at the same time, and can even do some of the IPsec ESP checking along the way. Further, there is compression. Compresion is basically identical to cryptography. (There is ongoing research on doing both at the same as well. A fellow now at Nortel may have succeeded from what I hear) Of course hardware can do all of these things too. So, I propose that all operations are essentially "encode"/"decode". A straight digest only ever does "encode". The digest routines follow the original MD5 libraries with open/update/etc. right there. It isn't clear to me that this is really a useful interface for a lot of applications, and it certainly can not be replaced by hardware. The operation queue =================== I would propose that all fucntions take a "struct transform_command *" as an argument, defined essentially like this: struct transform_command { struct list_head tc_cmdqueue; struct cipher_context *tc_context; transform_unit_callback tc_callback; cipher_usercontext tc_user; /* whatever the user callback wants */ unsigned int tc_flags; const u8 tc_iv[MAX_IV_SIZE]; const u8 *tc_in; u8 *tc_out; /* if NULL, use cc_in */ u8 *tc_mac; /* must point to space of tc_macsize */ size_t tc_insize; /* size of input buffer */ size_t tc_outsize; /* size of output buffer */ size_t tc_macsize; /* size of MAC output buffer */ size_t tc_resultsize; /* amount of output buffer used */ }; #define TC_FLAGS_GENERATE_IV (1<<0) /* if set, then IV must be generated */ Unfortunately, this is too big. It is 52 bytes + MAX_IV_SIZE. Making all of the sizes 16 bit integers, and putting the IV outline (a pointer) would get us down to 44 bytes. To be memory efficient, we need to fit this into the 48 bytes in the skb->cb, which avoids having to allocate another control structure for each packet. The result is therefore: struct transform_command { struct list_head tc_cmdqueue; struct cipher_context *tc_context; transform_unit_callback tc_callback; cipher_usercontext tc_user; /* whatever the user callback wants */ unsigned int tc_flags; const u8 *tc_iv; const u8 *tc_in; const u8 *tc_out; u8 *tc_mac; /* must point to space of tc_macsize */ u16 tc_insize; /* size of input buffer */ union { u16 tc_buffersize; /* size of output buffer */ u16 tc_resultsize; /* number of bytes in output */ } tc_output; }; yes, it is necessary to provide outsize as well as insize somehow. De-Compression could expand the output, and it must bounds check the output. We re-use the tc_outsize as the tc_resultsize. The tc_macsize is also dropped, as I believe that the MAC result will always be the same for a given digest. Note that the operation is implied by the cipher_context now, or we can steal some bits from tc_flags for it. Compression =========== I am hoping that this interface will also permit application to compression algorithms. There are a number of copies of libz already in the kernel. Getting the framework in for compression is very valuable. Name of project =============== The project should be renamed "dataxform" or "compressapi", since the libz replacement stuff could be mainlined. ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ -----BEGIN PGP SIGNATURE----- Version: 2.6.3ia Charset: latin1 Comment: Finger me for keys iQCVAwUBPSocfIqHRg3pndX9AQEuvwP+JpIOj24SaES5Nd5ZgNpXmlP3aSPtBP/u 5og1eHEyYl+kh339UMs6D2QWvspzPiyACdBa9YnRaNtDdiMj0jJaNaYeIHnvUwH3 GulQSgJWPqZxkq/LIsRv6hbMik0bnbQ2h+5sEfzNPRRiYLXIdmCXbVwEFYvIujMB qhCBPOrcJuk= =kuT6 -----END PGP SIGNATURE----- - Linux-crypto: cryptography in and on the Linux system Archive: http://mail.nl.linux.org/linux-crypto/