On Thu, May 11, 2023 at 3:30 AM Heiko Stübner <heiko@xxxxxxxxx> wrote: > > Hi Nathan, > > Am Dienstag, 11. April 2023, 17:00:00 CEST schrieb Nathan Huckleberry: > > On Wed, Mar 29, 2023 at 7:08 AM Heiko Stuebner <heiko@xxxxxxxxx> wrote: > > > +struct riscv64_ghash_ctx { > > > + void (*ghash_func)(u64 Xi[2], const u128 Htable[16], > > > + const u8 *inp, size_t len); > > > + > > > + /* key used by vector asm */ > > > + u128 htable[16]; > > > > This field looks too big. The assembly only loads the first 128-byte > > value from this table. > > OpenSSL defines the Htable field handed to the init- and the other > functions as this "u128 Htable[16]" [0] . As I really like the concept > of keeping in sync with openSSL, I guess I'd rather not change that. > > [0] https://github.com/openssl/openssl/blob/master/crypto/modes/gcm128.c#L88 > > > > Is this copied from another implementation? There's an optimization > > where you precompute the first N powers of H so that you can perform 1 > > finite field reduction for every N multiplications, but it doesn't > > look like that's being used here. > > The whole crypto-specific code comes from openSSL itself, so for now I > guess I'd like to try keeping things the same. > > > > > +#define RISCV64_ZBC_SETKEY(VARIANT, GHASH) \ > > > +void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]); \ > > > +static int riscv64_zbc_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm, \ > > > + const u8 *key, \ > > > + unsigned int keylen) \ > > > +{ \ > > > + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \ > > > + const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]), \ > > > + cpu_to_be64(((const u64 *)key)[1]) }; \ > > > + \ > > > + if (keylen != GHASH_BLOCK_SIZE) \ > > > + return -EINVAL; \ > > > + \ > > > + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); \ > > > + gcm_init_rv64i_ ## VARIANT(ctx->htable, k); \ > > > + \ > > > + ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH; \ > > > + \ > > > + return 0; \ > > > +} > > > > I'd prefer three identical functions over a macro here. Code searching > > tools and compiler warnings are significantly worse with macros. > > done :-) > > > > > + > > > +static int riscv64_zbc_ghash_update(struct shash_desc *desc, > > > + const u8 *src, unsigned int srclen) > > > +{ > > > + unsigned int len; > > > + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); > > > + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); > > > + > > > + if (dctx->bytes) { > > > + if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) { > > > + memcpy(dctx->buffer + dctx->bytes, src, > > > + srclen); > > > + dctx->bytes += srclen; > > > + return 0; > > > + } > > > + memcpy(dctx->buffer + dctx->bytes, src, > > > + GHASH_DIGEST_SIZE - dctx->bytes); > > > + > > > + ctx->ghash_func(dctx->shash, ctx->htable, > > > + dctx->buffer, GHASH_DIGEST_SIZE); > > > + > > > + src += GHASH_DIGEST_SIZE - dctx->bytes; > > > + srclen -= GHASH_DIGEST_SIZE - dctx->bytes; > > > + dctx->bytes = 0; > > > + } > > > + len = srclen & ~(GHASH_DIGEST_SIZE - 1); > > > + > > > + if (len) { > > > + gcm_ghash_rv64i_zbc(dctx->shash, ctx->htable, > > > + src, len); > > > + src += len; > > > + srclen -= len; > > > + } > > > + > > > + if (srclen) { > > > + memcpy(dctx->buffer, src, srclen); > > > + dctx->bytes = srclen; > > > + } > > > + return 0; > > > +} > > > + > > > +static int riscv64_zbc_ghash_final(struct shash_desc *desc, u8 *out) > > > +{ > > > + int i; > > > + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); > > > + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); > > > + > > > + if (dctx->bytes) { > > > + for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++) > > > + dctx->buffer[i] = 0; > > > + ctx->ghash_func(dctx->shash, ctx->htable, > > > + dctx->buffer, GHASH_DIGEST_SIZE); > > > > Can we do this without an indirect call? > > hmm, the indirect call is in both riscv64_zbc_ghash_update() and > riscv64_zbc_ghash_final() . And I found a missing one at the bottom > of riscv64_zbc_ghash_update(), where gcm_ghash_rv64i_zbc() is > called right now. > > Getting rid of the indirect call would mean duplicating both of these > functions for all instances. Especially with the slightly higher > complexity of the update this somehow seems not the best way to go. Indirect calls are quite expensive. They are an issue for things like disk/filesystem encryption because it introduces a ton of latency per block. I think this is a candidate for static calls. It looks like static call support hasn't been accepted for riscv yet. Maybe just add a TODO for now? See: https://lwn.net/Articles/771209/ https://lore.kernel.org/all/tencent_A8A256967B654625AEE1DB222514B0613B07@xxxxxx/ > > > Thanks for your pointers > Heiko > > Thanks, Huck