On Dec 6, 2021, at 14:14, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > On Tue, 30 Nov 2021 at 07:57, Bae, Chang Seok <chang.seok.bae@xxxxxxxxx> wrote: >> >> >> No, these two instruction sets are separate. So I think no room to share the >> ASM code. > > On arm64, we have > > aes-ce.S, which uses AES instructions to implement the AES core transforms > > aes-neon.S, which uses plain NEON instructions to implement the AES > core transforms > > aes-modes.S, which can be combined with either of the above, and > implements the various chaining modes (ECB, CBC, CTR, XTS, and a > helper for CMAC, CBCMAC and XMAC) > > If you have two different primitives for performing AES transforms > (the original round by round one, and the KL one that does 10 or 14 > rounds at a time), you should still be able to reuse most of the code > that implements the non-trivial handling of the chaining modes. Yes, no question about this for maintainability. However, besides the fact that a KL instruction takes multiple rounds, some AES-KL instructions have register constraints. E.g. AESENCWIDE256KL always uses XMM0-7 for input blocks. Today, AES-NI code maintains 32-bit compatibility, e.g. clobbering XMM2-3 for key and input vector, so sharing the code makes the AES-KL code inefficient and even ugly I think due to the register constraint. E.g. the AES-KL code does use XMM9-10 for key and an input vector, but it has to move them around just for code sharing. Thanks, Chang