On Wed, Oct 2, 2019 at 4:17 PM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > Expose the accelerated NEON ChaCha routine directly as a symbol > export so that users of the ChaCha library can use it directly. Eric had some nice code for ChaCha for certain ARM cores that lived in Zinc as chacha20-unrolled-arm.S. This code became active for certain cores where NEON was bad and for cores with no NEON. The condition for it was: switch (read_cpuid_part()) { case ARM_CPU_PART_CORTEX_A7: case ARM_CPU_PART_CORTEX_A5: /* The Cortex-A7 and Cortex-A5 do not perform well with the NEON * implementation but do incredibly with the scalar one and use * less power. */ break; default: chacha20_use_neon = elf_hwcap & HWCAP_NEON; } ... for (;;) { if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && simd_use(simd_context)) { const size_t bytes = min_t(size_t, len, PAGE_SIZE); chacha20_neon(dst, src, bytes, ctx->key, ctx->counter); ctx->counter[0] += (bytes + 63) / 64; len -= bytes; if (!len) break; dst += bytes; src += bytes; simd_relax(simd_context); } else { chacha20_arm(dst, src, len, ctx->key, ctx->counter); ctx->counter[0] += (len + 63) / 64; break; } } It's another instance in which the generic code was totally optimized out of Zinc builds. Did these changes make it into the existing tree?