On Thu, Feb 29, 2024 at 09:19:59PM -0500, Stefan Berger wrote: > +static void vli_mmod_fast_521(u64 *result, const u64 *product, > + const u64 *curve_prime, u64 *tmp) > +{ > + const unsigned int ndigits = 9; > + size_t i; > + > + for (i = 0; i < ndigits; i++) > + tmp[i] = product[i]; > + tmp[8] &= 0x1ff; Hm, the other vli_mmod_fast_*() functions manually unroll those loops. Wondering if that would make sense here as well? It's also possible to tell gcc to unroll a loop with a per-function... __attribute__((optimize("unroll-loops"))) ...but I'm not sure about clang portability. > @@ -941,6 +966,12 @@ static bool vli_mmod_fast(u64 *result, u64 *product, > + case 9: > + if (!strcmp(curve->name, "nist_521")) { > + vli_mmod_fast_521(result, product, curve_prime, tmp); > + break; > + } > + fallthrough; If you reorder patch 4 and 5, you could check for curve->nbits == 521 here, which might be cheaper than the string comparison. > -#define ECC_MAX_DIGITS (512 / 64) /* due to ecrdsa */ > +#define ECC_MAX_DIGITS (576 / 64) /* due to NIST P521 */ Maybe DIV_ROUND_UP(521, 64) for clarity? Thanks, Lukas