On Wed, Sep 12, 2018 at 8:28 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > Hi Ondrej, > > On Tue, Sep 11, 2018 at 09:42:38AM +0200, Ondrej Mosnacek wrote: > > This patch rewrites the tweak computation to a slightly simpler method > > that performs less bswaps. Based on performance measurements the new > > code seems to provide slightly better performance than the old one. > > > > PERFORMANCE MEASUREMENTS (x86_64) > > Performed using: https://gitlab.com/omos/linux-crypto-bench > > Crypto driver used: lrw(ecb-aes-aesni) > > > > Before: > > ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) > > lrw(aes) 256 64 204 286 > > lrw(aes) 320 64 227 203 > > lrw(aes) 384 64 208 204 > > lrw(aes) 256 512 441 439 > > lrw(aes) 320 512 456 455 > > lrw(aes) 384 512 469 483 > > lrw(aes) 256 4096 2136 2190 > > lrw(aes) 320 4096 2161 2213 > > lrw(aes) 384 4096 2295 2369 > > lrw(aes) 256 16384 7692 7868 > > lrw(aes) 320 16384 8230 8691 > > lrw(aes) 384 16384 8971 8813 > > lrw(aes) 256 32768 15336 15560 > > lrw(aes) 320 32768 16410 16346 > > lrw(aes) 384 32768 18023 17465 > > > > After: > > ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns) > > lrw(aes) 256 64 200 203 > > lrw(aes) 320 64 202 204 > > lrw(aes) 384 64 204 205 > > lrw(aes) 256 512 415 415 > > lrw(aes) 320 512 432 440 > > lrw(aes) 384 512 449 451 > > lrw(aes) 256 4096 1838 1995 > > lrw(aes) 320 4096 2123 1980 > > lrw(aes) 384 4096 2100 2119 > > lrw(aes) 256 16384 7183 6954 > > lrw(aes) 320 16384 7844 7631 > > lrw(aes) 384 16384 8256 8126 > > lrw(aes) 256 32768 14772 14484 > > lrw(aes) 320 32768 15281 15431 > > lrw(aes) 384 32768 16469 16293 > > > > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx> > > --- > > crypto/lrw.c | 49 +++++++++++++++++++++++++------------------------ > > 1 file changed, 25 insertions(+), 24 deletions(-) > > > > diff --git a/crypto/lrw.c b/crypto/lrw.c > > index 393a782679c7..b4f30b6f16d6 100644 > > --- a/crypto/lrw.c > > +++ b/crypto/lrw.c > > @@ -120,30 +120,19 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key, > > return 0; > > } > > > > -static inline void inc(be128 *iv) > > +static int next_index(u32 *counter) > > { > > - be64_add_cpu(&iv->b, 1); > > - if (!iv->b) > > - be64_add_cpu(&iv->a, 1); > > -} > > - > > -/* this returns the number of consequative 1 bits starting > > - * from the right, get_index128(00 00 00 00 00 00 ... 00 00 10 FB) = 2 */ > > -static inline int get_index128(be128 *block) > > -{ > > - int x; > > - __be32 *p = (__be32 *) block; > > - > > - for (p += 3, x = 0; x < 128; p--, x += 32) { > > - u32 val = be32_to_cpup(p); > > - > > - if (!~val) > > - continue; > > + int i, res = 0; > > > > - return x + ffz(val); > > + for (i = 0; i < 4; i++) { > > + if (counter[i] + 1 != 0) { > > + res += ffz(counter[i]++); > > + break; > > + } > > + counter[i] = 0; > > + res += 32; > > } > > - > > - return x; > > + return res; > > } > > This looks good, but can you leave the comment that says it returns the number > of leading 1's in the counter? And now that it increments the counter too. Good idea, will do. > > Actually, I think it's wrong in the case where the counter is all 1's and wraps > around. It will XOR with ->mulinc[128], which is off the end of the array, > instead of the correct value of ->mulinc[127]... But that's an existing bug :-/ Oh, right, good catch! > (If you do want to fix that, please do it in a separate patch, probably > preceding this one in the series, and add a test vector that covers it...) Yeah, will do that. > > > > > static int post_crypt(struct skcipher_request *req) > > @@ -209,8 +198,9 @@ static int pre_crypt(struct skcipher_request *req) > > struct scatterlist *sg; > > unsigned cryptlen; > > unsigned offset; > > - be128 *iv; > > bool more; > > + __u32 *iv; > > + u32 counter[4]; > > 'iv' should be '__be32 *', not '__u32 *'. Yep. > > > int err; > > > > subreq = &rctx->subreq; > > @@ -227,6 +217,11 @@ static int pre_crypt(struct skcipher_request *req) > > err = skcipher_walk_virt(&w, subreq, false); > > iv = w.iv; > > > > + counter[0] = be32_to_cpu(iv[3]); > > + counter[1] = be32_to_cpu(iv[2]); > > + counter[2] = be32_to_cpu(iv[1]); > > + counter[3] = be32_to_cpu(iv[0]); > > + > > while (w.nbytes) { > > unsigned int avail = w.nbytes; > > be128 *wsrc; > > @@ -242,10 +237,16 @@ static int pre_crypt(struct skcipher_request *req) > > /* T <- I*Key2, using the optimization > > * discussed in the specification */ > > be128_xor(&rctx->t, &rctx->t, > > - &ctx->mulinc[get_index128(iv)]); > > - inc(iv); > > + &ctx->mulinc[next_index(counter)]); > > } while ((avail -= bs) >= bs); > > > > + if (w.nbytes == w.total) { > > + iv[0] = cpu_to_be32(counter[3]); > > + iv[1] = cpu_to_be32(counter[2]); > > + iv[2] = cpu_to_be32(counter[1]); > > + iv[3] = cpu_to_be32(counter[0]); > > + } > > + > > err = skcipher_walk_done(&w, avail); > > } > > > > -- > > 2.17.1 > > > > - Eric Thanks, -- Ondrej Mosnacek <omosnace at redhat dot com> Associate Software Engineer, Security Technologies Red Hat, Inc.