Hi Eric, 在 2020/11/5 1:57, Eric Biggers 写道: > On Tue, Nov 03, 2020 at 08:15:06PM +0800, l00374334 wrote: >> From: liqiang <liqiang64@xxxxxxxxxx> >> >> In the libz library, the checksum algorithm adler32 usually occupies >> a relatively high hot spot, and the SVE instruction set can easily >> accelerate it, so that the performance of libz library will be >> significantly improved. >> >> We can divides buf into blocks according to the bit width of SVE, >> and then uses vector registers to perform operations in units of blocks >> to achieve the purpose of acceleration. >> >> On machines that support ARM64 sve instructions, this algorithm is >> about 3~4 times faster than the algorithm implemented in C language >> in libz. The wider the SVE instruction, the better the acceleration effect. >> >> Measured on a Taishan 1951 machine that supports 256bit width SVE, >> below are the results of my measured random data of 1M and 10M: >> >> [root@xxx adler32]# ./benchmark 1000000 >> Libz alg: Time used: 608 us, 1644.7 Mb/s. >> SVE alg: Time used: 166 us, 6024.1 Mb/s. >> >> [root@xxx adler32]# ./benchmark 10000000 >> Libz alg: Time used: 6484 us, 1542.3 Mb/s. >> SVE alg: Time used: 2034 us, 4916.4 Mb/s. >> >> The blocks can be of any size, so the algorithm can automatically adapt >> to SVE hardware with different bit widths without modifying the code. >> >> >> Signed-off-by: liqiang <liqiang64@xxxxxxxxxx> > > Note that this patch does nothing to actually wire up the kernel's copy of libz > (lib/zlib_{deflate,inflate}/) to use this implementation of Adler32. To do so, > libz would either need to be changed to use the shash API, or you'd need to > implement an adler32() function in lib/crypto/ that automatically uses an > accelerated implementation if available, and make libz call it. > > Also, in either case a C implementation would be required too. There can't be > just an architecture-specific implementation. Okay, thank you for the problems and suggestions you gave. I will continue to improve my code. > > Also as others have pointed out, there's probably not much point in having a SVE > implementation of Adler32 when there isn't even a NEON implementation yet. It's > not too hard to implement Adler32 using NEON, and there are already several > permissively-licensed NEON implementations out there that could be used as a > reference, e.g. my implementation using NEON instrinsics here: > https://github.com/ebiggers/libdeflate/blob/v1.6/lib/arm/adler32_impl.h > > - Eric > . > I am very happy to get this NEON implementation code. :) -- Best regards, Li Qiang