Re: [PATCH v2 04/20] crypto: arm/chacha - expose ARM ChaCha routine as library function

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Fri, 4 Oct 2019 15:52:52 +0200

On Wed, Oct 2, 2019 at 4:17 PM Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
> Expose the accelerated NEON ChaCha routine directly as a symbol
> export so that users of the ChaCha library can use it directly.

Eric had some nice code for ChaCha for certain ARM cores that lived in
Zinc as chacha20-unrolled-arm.S. This code became active for certain
cores where NEON was bad and for cores with no NEON. The condition for
it was:

        switch (read_cpuid_part()) {
       case ARM_CPU_PART_CORTEX_A7:
       case ARM_CPU_PART_CORTEX_A5:
               /* The Cortex-A7 and Cortex-A5 do not perform well with the NEON
                * implementation but do incredibly with the scalar one and use
                * less power.
                */
               break;
       default:
               chacha20_use_neon = elf_hwcap & HWCAP_NEON;
       }

...

        for (;;) {
               if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && chacha20_use_neon &&
                   len >= CHACHA20_BLOCK_SIZE * 3 && simd_use(simd_context)) {
                       const size_t bytes = min_t(size_t, len, PAGE_SIZE);

                       chacha20_neon(dst, src, bytes, ctx->key, ctx->counter);
                       ctx->counter[0] += (bytes + 63) / 64;
                       len -= bytes;
                       if (!len)
                               break;
                       dst += bytes;
                       src += bytes;
                       simd_relax(simd_context);
               } else {
                       chacha20_arm(dst, src, len, ctx->key, ctx->counter);
                       ctx->counter[0] += (len + 63) / 64;
                       break;
               }
       }

It's another instance in which the generic code was totally optimized
out of Zinc builds.

Did these changes make it into the existing tree?