Re: [PATCH 16/16] crypto: arm64/sm4 - add ARMv9 SVE cryptography acceleration implementation

Tianjia Zhang <tianjia.zhang@xxxxxxxxxxxxxxxxx> · Tue, 27 Sep 2022 12:26:29 +0800

Hi Ard,

On 9/26/22 6:02 PM, Ard Biesheuvel wrote:
(cc Mark Brown)

Hello Tianjia,

On Mon, 26 Sept 2022 at 11:37, Tianjia Zhang
<tianjia.zhang@xxxxxxxxxxxxxxxxx> wrote:

Scalable Vector Extension (SVE) is the next-generation SIMD extension for
arm64. SVE allows flexible vector length implementations with a range of
possible values in CPU implementations. The vector length can vary from a
minimum of 128 bits up to a maximum of 2048 bits, at 128-bit increments.
The SVE design guarantees that the same application can run on different
implementations that support SVE, without the need to recompile the code.

SVE was originally introduced by ARMv8, and ARMv9 introduced SVE2 to
expand and improve it. Similar to the Crypto Extension supported by the
NEON instruction set for the algorithm, SVE also supports the similar
instructions, called cryptography acceleration instructions, but this is
also optional instruction set.

This patch uses SM4 cryptography acceleration instructions and SVE2
instructions to optimize the SM4 algorithm for ECB/CBC/CFB/CTR modes.
Since the encryption of CBC/CFB cannot be parallelized, the Crypto
Extension instruction is used.

Given that we currently do not support the use of SVE in kernel mode,
this patch cannot be accepted at this time (but the rest of the series
looks reasonable to me, although I have only skimmed over the patches)

In view of the disappointing benchmark results below, I don't think
this is worth the hassle at the moment. If we can find a case where
using SVE in kernel mode truly makes a [favorable] difference, we can
revisit this, but not without a thorough analysis of the impact it
will have to support SVE in the kernel. Also, the fact that SVE may
also cover cryptographic extensions does not necessarily imply that a
micro-architecture will perform those crypto transformations in
parallel and so the performance may be the same even if VL > 128.

In summary, please drop this patch for now, and once there are more
encouraging performance numbers, please resubmit it as part of a
series that explicitly enables SVE in kernel mode on arm64, and
documents the requirements and constraints.

I have cc'ed Mark who has been working on the SVE support., who might
have something to add here as well.

Thanks,
Ard.

Thanks for your reply, the current performance of SVE is really
unsatisfactory. One reason is that the optimization of SVE needs to deal
with more and more complex data shifting operations, such as in CBC/CFB
mode, but also in CTR mode. needing more instruction to complete the
128-bit count increment, and the use of CE optimization does not have
these complications.

In addition, I naively thought that when the VL is 256-bit, the
performance will simply double compared to 128-bit. At present, this is
not the case. Maybe it is worth using SVE until there are significantly
improved performance data. I'll follow your advice and drop this
patch.

Best regards,
Tianjia