On Fri, 12 Jan 2018 14:21:15 +0100 Stephan Mueller <smueller@xxxxxxxxxx> wrote: > Hi, > > The kernel crypto API requires the caller to set an IV in the request data > structure. That request data structure shall define one particular cipher > operation. During the cipher operation, the IV is read by the cipher > implementation and eventually the potentially updated IV (e.g. in case of CBC) > is written back to the memory location the request data structure points to. Silly question, are we obliged to always write it back? In CBC it is obviously the same as the last n bytes of the encrypted message. I guess for ease of handling it makes sense to do so though. > > AF_ALG allows setting the IV with a sendmsg request, where the IV is stored in > the AF_ALG context that is unique to one particular AF_ALG socket. Note the > analogy: an AF_ALG socket is like a TFM where one recvmsg operation uses one > request with the TFM from the socket. > > AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with one > recvmsg call, multiple IOVECs can be specified. Each individual IOCB (derived > from one IOVEC) implies that one request data structure is created with the > data to be processed by the cipher implementation. The IV that was set with > the sendmsg call is registered with the request data structure before the > cipher operation. > > In case of an AIO operation, the cipher operation invocation returns > immediately, queuing the request to the hardware. While the AIO request is > processed by the hardware, recvmsg processes the next IOVEC for which another > request is created. Again, the IV buffer from the AF_ALG socket context is > registered with the new request and the cipher operation is invoked. > > You may now see that there is a potential race condition regarding the IV > handling, because there is *no* separate IV buffer for the different requests. > This is nicely demonstrated with libkcapi using the following command which > creates an AIO request with two IOCBs each encrypting one AES block in CBC > mode: > > kcapi -d 2 -x 9 -e -c "cbc(aes)" -k > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910 > > When the first AIO request finishes before the 2nd AIO request is processed, > the returned value is: > > 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32 > > I.e. two blocks where the IV output from the first request is the IV input to > the 2nd block. > > In case the first AIO request is not completed before the 2nd request > commences, the result is two identical AES blocks (i.e. both use the same IV): > > 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71 > > This inconsistent result may even lead to the conclusion that there can be a > memory corruption in the IV buffer if both AIO requests write to the IV buffer > at the same time. > > This needs to be solved somehow. I see the following options which I would > like to have vetted by the community. > Taking some 'entirely hypothetical' hardware with the following structure for all my responses - it's about as flexible as I think we'll see in the near future - though I'm sure someone has something more complex out there :) N hardware queues feeding M processing engines in a scheduler driven fashion. Actually we might have P sets of these, but load balancing and tracking and transferring contexts between these is a complexity I think we can ignore. If you want to use more than one of these P you'll just have to handle it yourself in userspace. Note messages may be shorter than IOCBs which raises another question I've been meaning to ask. Are all crypto algorithms obliged to run unlimited length IOCBs? If there are M messages in a particular queue and none elsewhere it is capable of processing them all at once (and perhaps returning out of order but we can fudge them back in order in the driver to avoid that additional complexity from an interface point of view). So I'm going to look at this from the hardware point of view - you have well addressed software management above. Three ways context management can be handled (in CBC this is basically just the IV). 1. Each 'work item' queued on a hardware queue has it's IV embedded with the data. This requires external synchronization if we are chaining across multiple 'work items' - note the hardware may have restrictions that mean it has to split large pieces of data up to encrypt them. Not all hardware may support per 'work item' IVs (I haven't done a survey to find out if everyone does...) 2. Each queue has a context assigned. We get a new queue whenever we want to have a different context. Runs out eventually but our hypothetical hardware may support a lot of queues. Note this version could be 'faked' by putting a cryptoengine queue on the front of the hardware queues. 3. The hardware supports IV dependency tracking in it's queues. That is, it can check if the address pointing to the IV is in use by one of the processing units which has not yet updated the IV ready for chaining with the next message. Note it might use a magic token rather than the IV pointer. For modes with out chaining (including counter modes) the IV pointer will inherently always be different. The hardware then simply schedules something else until it can safely run that particular processing unit. > 1. Require that the cipher implementations serialize any AIO requests that > have dependencies. I.e. for CBC, requests need to be serialized by the driver. > For, say, ECB or XTS no serialization is necessary. There is a certain requirement to do this anyway as we may have a streaming type situation and we don't want to have to do the chaining in userspace. So we send first X MB block to HW but before it has come back we have more data arrive that needs decrypting so we queue that behind it. The IV then needs to be updated automatically (or the code needs to do it on the first work item coming back). If you don't have option 3 above, you have to do this. This is what I was planning to implement for our existing hardware before you raised this question and I don't think we get around it being necessary for performance in any case. Setting up IOMMUs etc is costly so we want to be doing everything we can before the IV update is ready. > > 2. Change AF_ALG to require a per-request IV. This could be implemented by > moving the IV submission via CMSG from sendmsg to recvmsg. I.e. the recvmsg > code path would obtain the IV. > > I would tend to favor option 2 as this requires code change at only location. > If option 2 is considered, I would recommend to still allow setting the IV via > sendmsg CMSG (to keep the interface stable). If, however, the caller provides > an IV via recvmsg, this takes precedence. We definitely want to keep option 1 (which runs on the existing interface and does the magic in driver) for those who want it. So the only one left is the case 3 above where the hardware is capable of doing the dependency tracking. We can support that in two ways but one is rather heavyweight in terms of resources. 1) Whenever we want to allocate a new context we spin up a new socket and effectively associate a single IV with that (and it's chained updates) much like we do in the existing interface. 2) We allow a token based tracking of IVs. So userspace code maintains a counter and tags ever message and the initial IV setup with that counter. As the socket typically belongs to a userspace process tag creation can be in userspace and it can ensure it doesn't overlap tags (or it'll get the wrong answer). Kernel driver can then handle making sure any internal token / addresses are correct. I haven't looked at in depth but would imagine this one would be rather more invasive to support. > > If there are other options, please allow us to learn about them. > Glad we are addressing these usecases and that we have AIO support in general. Makes for a better discussion around whether in kernel support for these interfaces is actually as effective as moving to userspace drivers... Jonathan > Ciao > Stephan > >