On 12/12/2020 18:54, Ard Biesheuvel wrote: > On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: >> >> On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: >>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When >>>> crc32c-intel and crc32c-generic are both registered, system will >>>> use crc32c-intel because its .cra_priority is greater than >>>> crc32c-generic. This case expect to use crc32c-generic driver for >>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >>>> CPUs support from crc32c-intel. >>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@xxxxxxxxxxx> >>> >>> Does this mean that the performance of the crc32c instruction on those CPUs is >>> actually slower than a regular C implementation? That's very weird. >>> >> >> This driver does not use CRC instructions, but carryless >> multiplication and aggregation. So I suppose the pclmulqdq instruction >> triggers some pathological performance limitation here. >> > > Just noticed it uses both crc instructions and pclmulqdq instructions. > Sorry for the noise. > >> That means the crct10dif driver probably needs the same treatment. > > Tony, can you confirm that the problem is in the CRC instructions and > not in the PCLMULQDQ code path that supersedes it when available? CRC instructions. sincerely Tony