Re: [PATCH 2/2] arm64/crc32: Implement 4-way interleave using PMULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 16 Oct 2024 at 05:03, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
>
> On Tue, Oct 15, 2024 at 12:41:40PM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@xxxxxxxxxx>
> >
> > Now that kernel mode NEON no longer disables preemption, using FP/SIMD
> > in library code which is not obviously part of the crypto subsystem is
> > no longer problematic, as it will no longer incur unexpected latencies.
> >
> > So accelerate the CRC-32 library code on arm64 to use a 4-way
> > interleave, using PMULL instructions to implement the folding.
> >
> > On Apple M2, this results in a speedup of 2 - 2.8x when using input
> > sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
> > restoring the FP/SIMD register file may not be worth it, so 1k is used
> > as a threshold for choosing this code path.
> >
> > The coefficient tables were generated using code provided by Eric. [0]
> >
> > [0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c
> >
> > Cc: Eric Biggers <ebiggers@xxxxxxxxxx>
> > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > ---
> >  arch/arm64/lib/Makefile      |   2 +-
> >  arch/arm64/lib/crc32-glue.c  |  36 +++
> >  arch/arm64/lib/crc32-pmull.S | 240 ++++++++++++++++++++
> >  3 files changed, 277 insertions(+), 1 deletion(-)
>
> Thanks for doing this!  The new code looks good to me.  4-way does seem like the
> right choice for arm64.
>

Agreed.

> I'd recommend calling the file crc32-4way.S and the functions
> crc32*_arm64_4way(), rather than crc32-pmull.S and crc32*_pmull().  This would
> avoid confusion with a CRC implementation that is actually based entirely on
> pmull (which is possible).

I'm well aware :-)

commit 8fefde90e90c9f5c2770e46ceb127813d3f20c34
Author: Ard Biesheuvel <ardb@xxxxxxxxxx>
Date:   Mon Dec 5 18:42:27 2016 +0000

    crypto: arm64/crc32 - accelerated support based on x86 SSE implementation

commit 598b7d41e544322c8c4f3737ee8ddf905a44175e
Author: Ard Biesheuvel <ardb@xxxxxxxxxx>
Date:   Mon Aug 27 13:02:45 2018 +0200

    crypto: arm64/crc32 - remove PMULL based CRC32 driver

I removed it because it wasn't actually faster, although that might be
different on modern cores.

>  The proposed implementation uses the crc32
> instructions to do most of the work and only uses pmull for combining the CRCs.
> Yes, crc32c-pcl-intel-asm_64.S made this same mistake, but it is a mistake, IMO.
>

Yeah good point.




[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux