On Fri Mar 1, 2024 at 10:48 PM EET, Stefan Berger wrote: > > > On 3/1/24 15:26, Jarkko Sakkinen wrote: > > On Thu Feb 29, 2024 at 4:57 PM EET, Stefan Berger wrote: > >> > >> > >> On 2/29/24 04:11, Lukas Wunner wrote: > >>> On Fri, Feb 23, 2024 at 03:41:40PM -0500, Stefan Berger wrote: > >>>> +static inline void ecc_digits_from_bytes(const u8 *in, unsigned int nbytes, > >>>> + u64 *out, unsigned int ndigits) > >>>> +{ > >>>> + unsigned int sz = ndigits << ECC_DIGITS_TO_BYTES_SHIFT; > >>>> + u8 tmp[ECC_MAX_DIGITS << ECC_DIGITS_TO_BYTES_SHIFT]; > >>>> + unsigned int o = sz - nbytes; > >>>> + > >>>> + memset(tmp, 0, o); > >>>> + memcpy(&tmp[o], in, nbytes); > >>>> + ecc_swap_digits(tmp, out, ndigits); > >>>> +} > >>> > >>> Copying the whole key into tmp seems inefficient. You only need > >>> special handling for the first few bytes of "in" (6 bytes in the > >>> P521 case) and could use ecc_swap_digits() to convert the rest > >>> of "in" directly to "out" without using tmp. > >>> > >>> So it would be sufficient to allocate the first digit on the stack, > >>> memset + memcpy, then convert that to native byte order into "in[0]" > >>> and use ecc_swap_digits() for the rest. > >>> > >>> And the special handling would be conditional on "!o", so is skipped > >>> for existing curves. > >> > >> Thanks. It looks like this now: > >> > >> static inline void ecc_digits_from_bytes(const u8 *in, unsigned int nbytes, > >> u64 *out, unsigned int ndigits) > >> { > >> unsigned int o = nbytes & 7; > >> u64 msd = 0; > >> size_t i; > >> > >> if (o == 0) { > >> ecc_swap_digits(in, out, ndigits); > >> } else { > >> for (i = 0; i < o; i++) > >> msd = (msd << 8) | in[i]; > >> out[ndigits - 1] = msd; > >> ecc_swap_digits(&in[o], out, ndigits - 1); > > > > This would be more stream-lined IMHO: > > > > unsigned int o = nbytes & 7; > > unsigned int n = ndigits; > > u64 msd = 0; > > size_t i; > > > > if (o != 0) { > > for (i = 0; i < o; i++) > > msd = (msd << 8) | in[i]; > > > > out[--n] = msd; > > } > > > > ecc_swap_digits(in, out, n); > > You forgot to advance 'in'. yeah, pointing out that two call sites is unnecessary complexity, that's all BR, Jarkko