On Tue, Feb 22, 2022 at 05:02:16PM +0000, David Laight wrote: > From: Keith Busch > > Sent: 22 February 2022 16:32 > > > > The crc64 table lookup method is inefficient, using a significant number > > of CPU cycles in the block stack per IO. If available on x86, use a > > PCLMULQDQ implementation to accelerate the calculation. > > > > The assembly from this patch was mostly generated by gcc from a C > > program using library functions provided by x86 intrinsics, and measures > > ~20x faster than the table lookup. > > I think I'd like to see the C code and compiler options used to > generate the assembler as comments in the committed source file. > Either that or reasonable comments in the assembler. The C code, compiled as "gcc -O3 -msse4 -mpclmul -S", was adapted from this found on the internet: https://github.com/rawrunprotected/crc/blob/master/crc64.c I just ported it to linux, changed the poly parameters and removed the unnecessary stuff. I'm okay with dropping this patch from the series for now since I don't think I'm qualified to write it. :) I just needed something to test the crytpo module registration.