On Thu, Nov 05, 2020 at 05:05:53PM +0800, Li Qiang wrote: > > > 在 2020/11/5 15:51, Ard Biesheuvel 写道: > > Note that NEON intrinsics can be compiled for 32-bit ARM as well (with > > a bit of care - please refer to lib/raid6/recov_neon_inner.c for an > > example of how to deal with intrinsics that are only available on > > arm64) and are less error prone, so intrinsics should be preferred if > > feasible. > > > > However, you have still not explained how optimizing Adler32 makes a > > difference for a real-world use case. Where is libdeflate used on a > > hot path? > > . > > Sorry :(, I have not specifically searched for the use of this algorithm > in the kernel. > > When I used perf to test the performance of the libz library before, > I saw that the adler32 algorithm occupies a lot of hot spots.I just > saw this algorithm used in the kernel code, so I think optimizing this > algorithm may have some positive optimization effects on the kernel.:) Adler32 performance is important for zlib compression/decompression, which has a few use cases in the kernel such as btrfs compression. However, these days those few kernel use cases are mostly switching to newer algorithms like lz4 and zstd. Also as I mentioned, your patch doesn't actually wire up your code to be used by the kernel's implementation of zlib compression/decompression. I think you'd be much better off contributing to a userspace project, where DEFLATE/zlib/gzip support still has a long tail of use cases. The official zlib isn't really being maintained and isn't accepting architecture-specific optimizations, but there are some performance-oriented forks of zlib (e.g. https://chromium.googlesource.com/chromium/src/third_party/zlib/ and https://github.com/zlib-ng/zlib-ng), as well as other projects like libdeflate (https://github.com/ebiggers/libdeflate). Generally I'm happy to accept architecture-specific optimizations in libdeflate, but they need to be testable. - Eric