On Mon, Jul 24, 2017 at 07:59:43AM +0100, Ard Biesheuvel wrote: > On 18 July 2017 at 13:06, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > The generic AES driver uses 16 lookup tables of 1 KB each, and has > > encryption and decryption routines that are fully unrolled. Given how > > the dependencies between this code and other drivers are declared in > > Kconfig files, this code is always pulled into the core kernel, even > > if it is usually superseded at runtime by accelerated drivers that > > exist for many architectures. > > > > This leaves us with 25 KB of dead code in the kernel, which is negligible > > in typical environments, but which is actually a big deal for the IoT > > domain, where every kilobyte counts. > > > > Also, the scalar, table based AES routines that exist for ARM, arm64, i586 > > and x86_64 share the lookup tables with AES generic, and may be invoked > > occasionally when the time-invariant AES-NI or other special instruction > > drivers are called in interrupt context, at which time the SIMD register > > file cannot be used. Pulling 16 KB of code and 9 KB of instructions into > > the L1s (and evicting what was already there) when a softirq happens to > > be handled in the context of an interrupt taken from kernel mode (which > > means no SIMD on x86) is also something that we may like to avoid, by > > falling back to a much smaller and moderately less performant driver. > > (Note that arm64 will be updated shortly to supply fallbacks for all > > SIMD based AES implementations, which will be based on the core routines) > > > > For the reasons above, this series refactors the way the various AES > > implementations are wired up, to allow the generic version in > > crypto/aes_generic.c to be omitted from the build entirely. > > > > Patch #1 removes some bogus 'select CRYPTO_AES' statement. > > > > Patch #2 factors out aes-generic's lookup tables, which are shared with > > arch-specific implementations in arch/x86, arch/arm and arch/arm64. > > > > Patch #3 replaces the table based aes-generic.o with a new aes.o based on > > the fixed time cipher, and uses it to fulfil dependencies on CRYPTO_AES. > > > > Patch #4 switches the fallback in the AES-NI code to the new, generic encrypt > > and decrypt routines so it no longer depends on the x86 scalar code or > > [transitively] on AES-generic. > > > > Patch #5 tweaks the ARM table based code to only use 2 KB + 256 bytes worth > > of lookup tables instead of 4 KB. > > > > Patch #6 does the same for arm64 > > > > Patch #7 removes the local copy of the AES sboxes from the arm64 NEON driver, > > and switches to the ones exposed by the new AES core module instead. > > > > Patch #8 updates the Kconfig help text to be more descriptive of what they > > actually control, rather than duplicating AES's wikipedia entry a number of > > times. > > > > v4: - remove aes-generic altogether instead of allow a preference to be set > > Actually, after benchmarking the x86_64 asm AES code, I am not so sure > we should remove AES_GENERIC at all, since it turns out to be faster. > Interestingly, I found a remark by Eric in the git log stating the > same, so if we want to cut down on AES variants, we should probably > start by deleting the x86 code instead. > > So please disregard this for now: I will rework the other stuff I have > so it no longer depends on this, and repost, because it is much more > important for me that that makes it into v4.14. This can wait for > v4.15, as far as I am concerned, and I will benchmark a bit more to > see if we can get rid of the i586 code as well. > Yes I did notice that aes-generic was actually faster. Probably the x86_64-asm implementation should be removed, but it may be worthwhile to try a few different gcc versions to see how well they compile aes-generic. I expect that x86_64-asm used to be faster but gcc has gotten smarter. Also x86_64-asm is only really useful on older CPUs, so ideally it should be benchmarked on an older CPU. Eric