Hi, It seems the CTR bug is due to crypto_ctr_crypt_inplace() and crypto_ctr_crypt_segment() always returning 0. For the 4096 bytes test case I previously described, they should return 8 when walk.nbytes = 4008 and refrain from processing the last 8 bytes. (This is based on my current understanding of blkcipher. I may be wrong.) To resolve it, I added a "remain" parameter to both crypto_ctr_crypt_inplace() and crypto_ctr_crypt_segment(). This "remain" parameter represents a remainder modulo bsize. When "remain" is zero, both functions encrypt the full walk.nbytes data. When "remain" is positive, both functions encrypt up to the block boundary (i.e. walk.nbytes - walk.nbytes % bsize) and leave the "remain"-ing bytes for the next loop. The nbytes parameter in crypto_ctr_crypt() is used to keep track of how many bytes have been processed so far. If walk.nbytes >= nbytes, we know it is the last few blocks and hence we can set "remain" to 0. Else we set "remain" to walk.nbytes % bsize. Attached is the patch for ctr.c. (Note that it is against commit 19f4711b7a8f4bf5457a52873f82985a3762cd48 as I am having problems with the latest givcipher patches on UML. Sorry about that.) Signed-off-by: Tan Swee Heng <thesweeheng@xxxxxxxxx> --- By the way, such an approach is useful for Salsa20 too as eSTREAM provided encrypt_blocks() and encrypt_bytes() APIsl, which is essentially what I am trying to simulate with the "remain" parameter. On Nov 25, 2007 4:07 AM, Tan Swee Heng <thesweeheng@xxxxxxxxx> wrote: > I seem to have encountered a bug when encrypting large test vectors. > > TEST SETTINGS > I modified tcrypt.c to "ctr(aes,4,8,4)"-encrypt a single large test > vector defined as: > plaintext = 00...00 (4096 bytes) > key = 00...00 (32 bytes) > nonce = 00...00 (4 bytes) > iv = 00...00 (8 bytes) > counter = 4 bytes starting from 00...01. > > OUTPUT > The output I obtained (using a modified hexdump() in tcrypt.c) is this: > 3992 : e2 0e 5f 84 e9 05 ea 8f > 4000 : 75 f7 2a ac 4f e9 70 12 > 4008 : 80 88 db 7a 50 0b 22 cd > 4016 : 2d 23 c8 d1 90 8b 43 14 > 4024 : ae 69 5c 8a 49 ba e9 74 > 4032 : 71 2a 62 a2 14 14 df 86 > 4040 : a6 4c 51 5d 9a cc c3 9e > 4048 : cd e5 24 14 68 8e 5b f6 > 4056 : 4d 4f 06 c6 f1 7c 29 03 > 4064 : ab c7 50 b2 8e da 2e 34 > 4072 : c7 e9 d2 50 99 86 32 d4 > 4080 : 44 35 62 42 ef 04 05 8d > 4088 : 4c af 3c 8e be b9 f2 48 > > Compare with what I obtained using Crypto++: > 3992 : e2 0e 5f 84 e9 05 ea 8f > 4000 : 75 f7 2a ac 4f e9 70 12 > 4008 : 67 3e 05 ba c1 56 20 99 > 4016 : 80 88 db 7a 50 0b 22 cd > 4024 : 2d 23 c8 d1 90 8b 43 14 > 4032 : ae 69 5c 8a 49 ba e9 74 > 4040 : 71 2a 62 a2 14 14 df 86 > 4048 : a6 4c 51 5d 9a cc c3 9e > 4056 : cd e5 24 14 68 8e 5b f6 > 4064 : 4d 4f 06 c6 f1 7c 29 03 > 4072 : ab c7 50 b2 8e da 2e 34 > 4080 : c7 e9 d2 50 99 86 32 d4 > 4088 : 44 35 62 42 ef 04 05 8d > > As we can see, half a block (8 bytes) went "missing" at the start of byte 4008. > > DIAGNOSIS > I added a prink() after "while (walk.nbytes)" in crypto_ctr_crypt(). > On my system, it shows that the 4096 bytes buffer is broken into two > parts such that the first walk.nbytes = 4008 and the second > walk.nbytes = 88. Since 4008 % 16 = 8, therefore bytes 4000 to 4007 > are XORed with the first 8 bytes of AES(counter N). The remaining 8 > bytes of AES(counter N) are still available and are meant for bytes > 4008 to 4015. However they are discarded when N is incremented. Thus > when the 2nd part is encrypted, byte 4008 to 4015 are XORed with the > first 8 bytes of AES(counter N+1) instead. The rest of the stream > slips as a result of this. > > RESOLUTION > One idea is to keep state on how many bytes in the AES(counter N) > buffer has been consumed so that N is not incremented unnecessarily. > But there may be inefficiencies associated with these additional > checks. If anyone has better ideas, please feel free to post it. (Note > that salsa20_generic.c has the same problem. So any great ideas for > CTR would be useful to me too. :-) > > Swee Heng >
diff --git a/crypto/ctr.c b/crypto/ctr.c index b974a9f..550c44e 100644 --- a/crypto/ctr.c +++ b/crypto/ctr.c @@ -106,7 +106,8 @@ static int crypto_ctr_setkey(struct crypto_tfm *parent, const u8 *key, static int crypto_ctr_crypt_segment(struct blkcipher_walk *walk, struct crypto_cipher *tfm, u8 *ctrblk, - unsigned int countersize) + unsigned int countersize, + unsigned int remain) { void (*fn)(struct crypto_tfm *, u8 *, const u8 *) = crypto_cipher_alg(tfm)->cia_encrypt; @@ -116,9 +117,12 @@ static int crypto_ctr_crypt_segment(struct blkcipher_walk *walk, u8 *keystream = (u8 *)ALIGN((unsigned long)ks, alignmask + 1); u8 *src = walk->src.virt.addr; u8 *dst = walk->dst.virt.addr; - unsigned int nbytes = walk->nbytes; + unsigned int nbytes = walk->nbytes - remain; do { + /* increment counter in counterblock */ + ctr_inc_quad(ctrblk + (bsize - countersize), countersize); + /* create keystream */ fn(crypto_cipher_tfm(tfm), keystream, ctrblk); xor_quad(keystream, src, min(nbytes, bsize)); @@ -126,9 +130,6 @@ static int crypto_ctr_crypt_segment(struct blkcipher_walk *walk, /* copy result into dst */ memcpy(dst, keystream, min(nbytes, bsize)); - /* increment counter in counterblock */ - ctr_inc_quad(ctrblk + (bsize - countersize), countersize); - if (nbytes < bsize) break; @@ -143,25 +144,26 @@ static int crypto_ctr_crypt_segment(struct blkcipher_walk *walk, static int crypto_ctr_crypt_inplace(struct blkcipher_walk *walk, struct crypto_cipher *tfm, u8 *ctrblk, - unsigned int countersize) + unsigned int countersize, + unsigned int remain) { void (*fn)(struct crypto_tfm *, u8 *, const u8 *) = crypto_cipher_alg(tfm)->cia_encrypt; unsigned int bsize = crypto_cipher_blocksize(tfm); unsigned long alignmask = crypto_cipher_alignmask(tfm); - unsigned int nbytes = walk->nbytes; + unsigned int nbytes = walk->nbytes - remain; u8 *src = walk->src.virt.addr; u8 ks[bsize + alignmask]; u8 *keystream = (u8 *)ALIGN((unsigned long)ks, alignmask + 1); do { - /* create keystream */ - fn(crypto_cipher_tfm(tfm), keystream, ctrblk); - xor_quad(src, keystream, min(nbytes, bsize)); - /* increment counter in counterblock */ ctr_inc_quad(ctrblk + (bsize - countersize), countersize); + /* create keystream */ + fn(crypto_cipher_tfm(tfm), keystream, ctrblk); + xor_quad(src, keystream, bsize); + if (nbytes < bsize) break; @@ -188,6 +190,7 @@ static int crypto_ctr_crypt(struct blkcipher_desc *desc, u8 cblk[bsize + alignmask]; u8 *counterblk = (u8 *)ALIGN((unsigned long)cblk, alignmask + 1); int err; + unsigned int remain; blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt_block(desc, &walk, bsize); @@ -197,21 +200,20 @@ static int crypto_ctr_crypt(struct blkcipher_desc *desc, memcpy(counterblk, ctx->nonce, ictx->noncesize); memcpy(counterblk + ictx->noncesize, walk.iv, ictx->ivsize); - /* initialize counter portion of counter block */ - ctr_inc_quad(counterblk + (bsize - ictx->countersize), - ictx->countersize); - while (walk.nbytes) { + remain = (walk.nbytes >= nbytes) ? 0 : (walk.nbytes % bsize); + nbytes -= (walk.nbytes - remain); + if (walk.src.virt.addr == walk.dst.virt.addr) - nbytes = crypto_ctr_crypt_inplace(&walk, child, - counterblk, - ictx->countersize); + crypto_ctr_crypt_inplace(&walk, child, + counterblk, + ictx->countersize, remain); else - nbytes = crypto_ctr_crypt_segment(&walk, child, - counterblk, - ictx->countersize); + crypto_ctr_crypt_segment(&walk, child, + counterblk, + ictx->countersize, remain); - err = blkcipher_walk_done(desc, &walk, nbytes); + err = blkcipher_walk_done(desc, &walk, remain); } return err; }