On Sat, Nov 02, 2024 at 07:08:43PM +0800, Herbert Xu wrote: > On Sat, Nov 02, 2024 at 12:05:01PM +0100, Ard Biesheuvel wrote: > > > > The only issue resulting from *not* taking this patch is that btrfs > > may misidentify the CRC32 implementation as being 'slow' and take an > > alternative code path, which does not necessarily result in worse > > performance. > > If we were removing crc32* (or at least crc32*-arch) from the Crypto > API then these patches would be redundant. But if we're keeping them > because btrfs uses them then we should definitely make crc32*-arch > do the right thing. IOW they should not be registered if they're > the same as crc32*-generic. > > Thanks, I would like to eventually remove crc32 and crc32c from the crypto API, but it will take some time to get all the users converted. If there are AF_ALG users it could even be impossible, though the usual culprit, iwd, doesn't appear to use any CRCs, so hopefully we are fine there. I will plan to keep this patch, but change it to use a crc32_optimizations() function instead which was Ard's first suggestion. I don't think Ard's static_call suggestion would work as-is, since considering the following: static inline u32 __pure crc32_le(u32 crc, const u8 *p, size_t len) { if (IS_ENABLED(CONFIG_CRC32_ARCH)) return static_call(crc32_le_arch)(crc, p, len); return crc32_le_base(crc, p, len); } ... the 'static_call(crc32_le_arch)(crc, p, len)' will be inlined into every user, which could be a loadable module which gets loaded after crc32-${arch}.ko. And AFAIK, static calls in that module won't be updated in that case. That could be avoided by making crc32_le() a non-inline function in lib/crc32.c, so the static call would only be in that one place. That has the slight disadvantage that it would introduce an extra jump into the common case where the optimized function is enabled. Considering that some users are passing small amounts of data into the CRC functions (e.g., 4 bytes), I would like to minimize the overhead as much as possible. It could also be avoided by making CRC32 and CRC32_ARCH bool rather than tristate. I would prefer not to do that, since there can be situations where only loadable modules need these functions so they should not have to be built into the core kernel. So I plan to go with the crc32_optimizations() solution in v3. - Eric