On Sat, 2 Nov 2024 at 11:46, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > On Sat, 2 Nov 2024 at 11:20, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote: > > > > On Sat, Nov 02, 2024 at 10:58:53AM +0100, Ard Biesheuvel wrote: > > > > > > At least btrfs supports a variety of checksums/hashes (crc32c, xxhash, > > > sha) via the shash API. > > > > OK, given that btrfs is still doing this, I think we should still > > register crc32c-arch conditionally. Having it point to crc32c-generic > > is just confusing (at least if you use btrfs). > > > > Agreed. So we should take this patch. > > The current issue with btrfs is that it will misidentify > crc32c-generic on arm64 as being 'slow', but this was already fixed by > my patches that are already in cryptodev. > > On arm64, crc32 instructions are always available (the only known > micro-architecture that omitted them has been obsolete for years), and > on x86_64 the situation is similar in practice (introduced in SSE > 4.2), and so this patch changes very little for the majority of btrfs > users. > > But on architectures such as 32-bit ARM, where these instructions are > only available if you are booting a 32-bit kernel on a 64-bit CPU > (which is more common than you might think), this patch will ensure > that crc32-arm / crc32c-arm are only registered if the instructions > are actually available, and btrfs will take the slow async patch for > checksumming if they are not. (I seriously doubt that btrfs on 32-bit > ARM is a thing but who knows) (actually, backpedalling a little bit - apologies) OTOH,btrfs is the only user where this makes a difference, and its use of the driver name is highly questionable IMO. On x86, it shouldn't make a difference in practice, on arm64, it was broken for a long time, and on the remaining architectures, I seriously doubt that anyone cares about this, and so we can fix this properly if there is a need. The only issue resulting from *not* taking this patch is that btrfs may misidentify the CRC32 implementation as being 'slow' and take an alternative code path, which does not necessarily result in worse performance. And I'd prefer static_call() / static_call_query() over a separate global variable to keep track of whether or not the generic code is in use.