On Fri, 28 Apr 2023 at 13:59, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Apr 28, 2023, at 5:57 AM, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > On Fri, 28 Apr 2023 at 10:44, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > >> > >> Scott Mayhew <smayhew@xxxxxxxxxx> wrote: > >>> > >>> diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S > >>> index 0e834a2c062c..477605fad76b 100644 > >>> --- a/arch/arm64/crypto/aes-modes.S > >>> +++ b/arch/arm64/crypto/aes-modes.S > >>> @@ -268,6 +268,7 @@ AES_FUNC_START(aes_cbc_cts_encrypt) > >>> add x4, x0, x4 > >>> st1 {v0.16b}, [x4] /* overlapping stores */ > >>> st1 {v1.16b}, [x0] > >>> + st1 {v1.16b}, [x5] > >>> ret > >>> AES_FUNC_END(aes_cbc_cts_encrypt) > >>> > >>> But I don't know if that change is at all correct! (I've never even > >>> looked at arm64 asm before). If someone who's knowledgeable about this > >>> code could chime in, I'd appreciate it. > >> > >> Ard, could you please take a look at this? > >> > > > > The issue seems to be that the caller expects iv_out to have been > > populated even when doing ciphertext stealing. > > > > This is meaningless, because the output IV can only be used to chain > > additional encrypted data if the split is at a block boundary. The > > ciphertext stealing fundamentally terminates the encryption, and > > produces a block of ciphertext that is shorter than the block size, so > > what the output IV should be is actually unspecified. > > > > IOW, test cases having plain/ciphertext vectors whose size is not a > > multiple of the cipher block size should not specify an expected value > > for the output IV. > > The test cases are extracted from RFC 3962 Appendix B. The > standard clearly expects there to be a non-zero next IV for > plaintext sizes that are not block-aligned. > OK, so this is the Kerberos V specific spec on how to use AES in CBC mode, which appears to describe how to chain multiple CBC encryptions together. CBC-CTS itself does not define this: the IV is an input only, and the reason we happen to return the IV is because a single CBC operation may be broken up into several ones, which can only be done on block boundaries. This is why CBC-CTS itself passes all its tests: a single CBC-CTS encryption only performs ciphertext stealing at the very end, and the next IV is never used in that case. (This is why the CTS mode tests in crypto/testmgr.h don't have iv_out vectors) Note that RFC3962 defines that the penultimate block of CBC-CTS ciphertext is used as the next IV. CBC returns the last ciphertext block as the output IV. It is a happy coincidence that the generic CTS template encapsulates CBC in a way where its output IV ends up in the right place. > Also, these test cases pass on other hardware platforms. > Fair enough. I am not opposed to fixing this, but given that it is the RFC3962 spec that defines that the next IV is the penultimate full block of the previous CBC-CTS ciphertext, we might consider doing the memcpy() in the Kerberos code not in the CBC-CTS implementations. (The 32-bit ARM code will be broken in the same way, and potentially other implementations as well)