Re: [PATCH 5/6] crypto: x86/sm3 - add AVX assembly implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/21/21 2:03 AM, Jussi Kivilinna wrote:
On 20.12.2021 10.22, Tianjia Zhang wrote:
This patch adds AVX assembly accelerated implementation of SM3 secure
hash algorithm. From the benchmark data, compared to pure software
implementation sm3-generic, the performance increase is up to 38%.

The main algorithm implementation based on SM3 AES/BMI2 accelerated
work by libgcrypt at:
https://gnupg.org/software/libgcrypt/index.html

Benchmark on Intel i5-6200U 2.30GHz, performance data of two
implementations, pure software sm3-generic and sm3-avx acceleration.
The data comes from the 326 mode and 422 mode of tcrypt. The abscissas
are different lengths of per update. The data is tabulated and the
unit is Mb/s:

update-size |     16      64     256    1024    2048    4096    8192
--------------------------------------------------------------------
sm3-generic | 105.97  129.60  182.12  189.62  188.06  193.66  194.88
sm3-avx     | 119.87  163.05  244.44  260.92  257.60  264.87  265.88

Signed-off-by: Tianjia Zhang <tianjia.zhang@xxxxxxxxxxxxxxxxx>
---
  arch/x86/crypto/Makefile         |   3 +
  arch/x86/crypto/sm3-avx-asm_64.S | 521 +++++++++++++++++++++++++++++++
  arch/x86/crypto/sm3_avx_glue.c   | 134 ++++++++
  crypto/Kconfig                   |  13 +
  4 files changed, 671 insertions(+)
  create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S
  create mode 100644 arch/x86/crypto/sm3_avx_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..7cbe860f6201 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o
  obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o
+obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o
+sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o
+
  obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o
  sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o
diff --git a/arch/x86/crypto/sm3-avx-asm_64.S b/arch/x86/crypto/sm3-avx-asm_64.S
new file mode 100644
index 000000000000..e7a9a37f3609
--- /dev/null
+++ b/arch/x86/crypto/sm3-avx-asm_64.S
@@ -0,0 +1,521 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * SM3 AVX accelerated transform.
+ * specified in: https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
+ *
+ * Copyright (C) 2021 Jussi Kivilinna <jussi.kivilinna@xxxxxx>
+ * Copyright (C) 2021 Tianjia Zhang <tianjia.zhang@xxxxxxxxxxxxxxxxx>
+ */
<snip>
+
+#define R(i, a, b, c, d, e, f, g, h, round, widx, wtype)                      \ +    /* rol(a, 12) => t0 */                                                \ +    roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \ +    /* rol (t0 + e + t), 7) => t1 */                                      \ +    addl3(t0, e, t1);                                                     \ +    addl $K##round, t1;                                                   \

It's better to use "leal K##round(t0, e, 1), t1;" here and fix K0-K63 macros
instead as I noted at libgcrypt mailing-list:
  https://lists.gnupg.org/pipermail/gcrypt-devel/2021-December/005209.html

-Jussi

Thanks for pointing it out, I will fix it in the next patch.

Best regards,
Tianjia



[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux