> > > For XTS, you have this additional curve ball being thrown in called the "tweak". > > > For encryption, the underlying "xts" would need to be able to chain the tweak, > > > from what I've seen of the source the implementation cannot do that. > > > > You simply use the underlying xts for the first n - 2 blocks and > > do the last two by hand. > > > > OK, so it appears the XTS ciphertext stealing algorithm does not > include the peculiar reordering of the 2 final blocks, which means > that the kernel's implementation of XTS already conforms to the spec > for inputs that are a multiple of the block size. > Yes, for XTS you effectively don't do CTS if it's a 16 byte multiple ... > The reason I am not a fan of making any changes here is that there are > no in-kernel users that require ciphertext stealing for XTS, nor is > anyone aware of any reason why we should be adding it to the userland > interface. So we are basically adding dead code so that we are > theoretically compliant in a way that we will never exercise in > practice. > You know, having worked on all kinds of workarounds for silly irrelevant (IMHO) corner cases in the inside-secure hardware driver over the past months just to keep testmgr happy, this is kind of ironic ... Cipher text stealing happens to be a *major* part of the XTS specification (it's not actually XTS without the CTS part!), yet you are suggesting not to implement it because *you* don't have or know a use case for it. That seems like a pretty bad argument to me. It's not some minor corner case that's not supported.The implementation is just *incomplete* without it. > Note that for software algorithms such as the bit sliced NEON > implementation of AES, which can only operate on 8 AES blocks at a > time, doing the final 2 blocks sequentially is going to seriously > impact performance. This means whatever wrapper we invent around xex() > (or whatever we call it) should go out of its way to ensure that the > common, non-CTS case does not regress in performance, and the special > handling is only invoked when necessary (which will be never). > I pretty much made the same argument about all these driver workarounds slowing down my driver fast path but that was considered a non-issue. In this particular case, it should not need to be more than: if (unlikely(size & 15)) { xts_with_partial_last_block(); } else { xts_with_only_full_blocks(); } Regards, Pascal van Leeuwen Silicon IP Architect, Multi-Protocol Engines @ Verimatrix www.insidesecure.com