On Wed, Oct 16, 2024 at 09:12:41AM +0200, Ard Biesheuvel wrote: > > I'd recommend calling the file crc32-4way.S and the functions > > crc32*_arm64_4way(), rather than crc32-pmull.S and crc32*_pmull(). This would > > avoid confusion with a CRC implementation that is actually based entirely on > > pmull (which is possible). > > I'm well aware :-) > > commit 8fefde90e90c9f5c2770e46ceb127813d3f20c34 > Author: Ard Biesheuvel <ardb@xxxxxxxxxx> > Date: Mon Dec 5 18:42:27 2016 +0000 > > crypto: arm64/crc32 - accelerated support based on x86 SSE implementation > > commit 598b7d41e544322c8c4f3737ee8ddf905a44175e > Author: Ard Biesheuvel <ardb@xxxxxxxxxx> > Date: Mon Aug 27 13:02:45 2018 +0200 > > crypto: arm64/crc32 - remove PMULL based CRC32 driver > > I removed it because it wasn't actually faster, although that might be > different on modern cores. The PMULL-based code removed by commit 598b7d41e544 was only 4-wide. On Apple M1, a 12-wide PMULL-based CRC32 is actually faster than 4-way CRC32, especially if the eor3 instruction from the sha3 extension is utilized. This was not the case on non-Apple CPUs I tested (in 2022), though. 12-wide is very wide and is a bit inconvenient, and IMO it's not worth doing in the kernel at this point. It would be interesting to test the very latest CPUs, though. - Eric