RE: [PATCHv2] crypto: xts - Add support for Cipher Text Stealing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ard,

> -----Original Message-----
> From: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> Sent: Thursday, August 8, 2019 9:45 AM
> To: Pascal van Leeuwen <pascalvanl@xxxxxxxxx>
> Cc: open list:HARDWARE RANDOM NUMBER GENERATOR CORE <linux-crypto@xxxxxxxxxxxxxxx>;
> Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Pascal
> Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx>
> Subject: Re: [PATCHv2] crypto: xts - Add support for Cipher Text Stealing
> 
> Hello Pascal,
> 
> Thanks for looking into this.
> 
> On Thu, 8 Aug 2019 at 10:20, Pascal van Leeuwen <pascalvanl@xxxxxxxxx> wrote:
> >
> > This adds support for Cipher Text Stealing for data blocks that are not an
> > integer multiple of the cipher block size in size, bringing it fully in
> > line with the IEEE P1619/D16 standard.
> >
> > This has been tested with the AES-XTS test vectors from the IEEE P1619/D16
> > specification as well as some additional test vectors supplied to the
> > linux_crypto mailing list previously. It has also been fuzzed against
> > Inside Secure AES-XTS hardware which has been actively used in the field
> > for more than a decade already.
> >
> > changes since v1:
> > - Fixed buffer overflow issue due to subreq not being the last entry in
> >   rctx, this turns out to be a crypto API requirement. Thanks to Milan
> >   Broz <gmazyland@xxxxxxxxx> for finding this and providing the solution.
> > - Removed some redundant error returning code from the _finish_cts()
> >   functions that currently cannot fail, therefore would always return 0.
> > - removed rem_bytes computation behind init_crypt() in the encrypt() and
> >   decrypt() functions, no need to compute for lengths < 16
> > - Fixed comment style for single line comments
> >
> 
> Please put the changelog below the ---
> 
Ok, I can resubmit with that fixed

> > Signed-off-by: Pascal van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxx>
> > ---
> >  crypto/xts.c | 229 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 209 insertions(+), 20 deletions(-)
> >
> > diff --git a/crypto/xts.c b/crypto/xts.c
> > index 33cf726..17b551d 100644
> > --- a/crypto/xts.c
> > +++ b/crypto/xts.c
> > @@ -1,7 +1,5 @@
> >  /* XTS: as defined in IEEE1619/D16
> >   *     http://grouper.ieee.org/groups/1619/email/pdf00086.pdf
> > - *     (sector sizes which are not a multiple of 16 bytes are,
> > - *     however currently unsupported)
> >   *
> >   * Copyright (c) 2007 Rik Snel <rsnel@xxxxxxxxxxxxxxx>
> >   *
> > @@ -28,6 +26,7 @@
> >
> >  struct priv {
> >         struct crypto_skcipher *child;
> > +       struct crypto_cipher *base;
> 
> Why do you need a separate cipher for the ciphertext stealing? ECB can
> be invoked with a single block just fine, and it will behave exactly
> like the bare cipher.
> 
As you already pointed out, it may be a different key from the tweak,
and as I myself already explained I really do *not* want to use the 
skcipher which may be HW accelerated with terrible latency.

I want some SW implementation that's fast on a single block here. And I
can't call a library function directly as the underlying cipher can be 
anything (with a 128 bit blocksize).

Also, pushing it through the main skcipher was a bit more complexity
than I could manage, not being a software engineer by profession.
I leave the optimizations to people better equipped to do them :-)

> >         struct crypto_cipher *tweak;
> >  };
> >
> > @@ -37,7 +36,9 @@ struct xts_instance_ctx {
> >  };
> >
> >  struct rctx {
> > -       le128 t;
> > +       le128 t, tcur;
> > +       int rem_bytes, is_encrypt;
> 
> Instead of adding the is_encrypt flag, could we change crypt_done into
> encrypt_done/decrypt_done?
> 
That's possible, but what would be the advantage? Ok, it would save one
conditional branch for the case where you do need CTS. But I doubt that
is significant on the total CTS overhead. The implementation is far from
optimal anyway, as the goal was to get something functional first ...

> > +       /* must be the last, expanded beyond end of struct! */
> >         struct skcipher_request subreq;
> 
> This is not sufficient. You have to add a TFM init() function which
> sets the request size. Please look at the cts code for an example.
>
I believe that is already done correctly (then again I'm no expert).
Note that I did *not* add the subreq, it was already there. I just
added some more struct members that needed to be above, not below.
I originally did not even know it could grow beyond its struct size.

> >  };
> >
> > @@ -47,6 +48,7 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key,
> >         struct priv *ctx = crypto_skcipher_ctx(parent);
> >         struct crypto_skcipher *child;
> >         struct crypto_cipher *tweak;
> > +       struct crypto_cipher *base;
> >         int err;
> >
> >         err = xts_verify_key(parent, key, keylen);
> > @@ -55,9 +57,11 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key,
> >
> >         keylen /= 2;
> >
> > -       /* we need two cipher instances: one to compute the initial 'tweak'
> > -        * by encrypting the IV (usually the 'plain' iv) and the other
> > -        * one to encrypt and decrypt the data */
> > +       /* we need three cipher instances: one to compute the initial 'tweak'
> > +        * by encrypting the IV (usually the 'plain' iv), one to encrypt and
> > +        * decrypt the data and finally one to encrypt the last block(s) for
> > +        * cipher text stealing
> > +        */
> >
> >         /* tweak cipher, uses Key2 i.e. the second half of *key */
> >         tweak = ctx->tweak;
> > @@ -79,6 +83,13 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key,
> >         crypto_skcipher_set_flags(parent, crypto_skcipher_get_flags(child) &
> >                                           CRYPTO_TFM_RES_MASK);
> >
> > +       /* Also data cipher, using Key1, for applying CTS */
> > +       base = ctx->base;
> > +       crypto_cipher_clear_flags(base, CRYPTO_TFM_REQ_MASK);
> > +       crypto_cipher_set_flags(base, crypto_skcipher_get_flags(parent) &
> > +                                     CRYPTO_TFM_REQ_MASK);
> > +       err = crypto_cipher_setkey(base, key, keylen);
> > +
> >         return err;
> >  }
> >
> > @@ -88,13 +99,12 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key,
> >   * mutliple calls to the 'ecb(..)' instance, which usually would be slower than
> >   * just doing the gf128mul_x_ble() calls again.
> >   */
> > -static int xor_tweak(struct skcipher_request *req, bool second_pass)
> > +static int xor_tweak(struct skcipher_request *req, bool second_pass, le128 *t)
> >  {
> >         struct rctx *rctx = skcipher_request_ctx(req);
> >         struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> >         const int bs = XTS_BLOCK_SIZE;
> >         struct skcipher_walk w;
> > -       le128 t = rctx->t;
> >         int err;
> >
> >         if (second_pass) {
> > @@ -104,6 +114,7 @@ static int xor_tweak(struct skcipher_request *req, bool second_pass)
> >         }
> >         err = skcipher_walk_virt(&w, req, false);
> >
> > +       *t = rctx->t;
> >         while (w.nbytes) {
> >                 unsigned int avail = w.nbytes;
> >                 le128 *wsrc;
> > @@ -113,8 +124,8 @@ static int xor_tweak(struct skcipher_request *req, bool second_pass)
> >                 wdst = w.dst.virt.addr;
> >
> >                 do {
> > -                       le128_xor(wdst++, &t, wsrc++);
> > -                       gf128mul_x_ble(&t, &t);
> > +                       le128_xor(wdst++, t, wsrc++);
> > +                       gf128mul_x_ble(t, t);
> >                 } while ((avail -= bs) >= bs);
> >
> >                 err = skcipher_walk_done(&w, avail);
> > @@ -123,14 +134,97 @@ static int xor_tweak(struct skcipher_request *req, bool
> second_pass)
> >         return err;
> >  }
> >
> > -static int xor_tweak_pre(struct skcipher_request *req)
> > +static int xor_tweak_pre(struct skcipher_request *req, le128 *t)
> > +{
> > +       return xor_tweak(req, false, t);
> > +}
> > +
> > +static int xor_tweak_post(struct skcipher_request *req, le128 *t)
> >  {
> > -       return xor_tweak(req, false);
> > +       return xor_tweak(req, true, t);
> > +}
> > +
> > +static void encrypt_finish_cts(struct skcipher_request *req)
> > +{
> > +       struct rctx *rctx = skcipher_request_ctx(req);
> > +       /* Not a multiple of cipher blocksize, need CTS applied */
> > +       struct priv *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
> > +       le128 lastblock, lastptext;
> > +
> > +       /* Handle last partial block - apply Cipher Text Stealing */
> > +
> > +       /* Copy last ciphertext block just processed to buffer  */
> > +       sg_pcopy_to_buffer(req->dst, sg_nents(req->dst), &lastblock,
> > +                          XTS_BLOCK_SIZE,
> > +                          req->cryptlen - XTS_BLOCK_SIZE);
> > +       /* Save last plaintext bytes, next step may overwrite!! */
> > +       sg_pcopy_to_buffer(req->src, sg_nents(req->src), &lastptext,
> > +                          rctx->rem_bytes, req->cryptlen);
> > +       /* Copy first rem_bytes of ciphertext behind last full block */
> > +       sg_pcopy_from_buffer(req->dst, sg_nents(req->dst), &lastblock,
> > +                            rctx->rem_bytes, req->cryptlen);
> > +       /*
> > +        * Copy last remaining bytes of plaintext to combine buffer,
> > +        * replacing part of the ciphertext
> > +        */
> > +       memcpy(&lastblock, &lastptext, rctx->rem_bytes);
> > +       /* XTS encrypt the combined block */
> > +       le128_xor(&lastblock, &rctx->tcur, &lastblock);
> > +       crypto_cipher_encrypt_one(ctx->base, (u8 *)&lastblock,
> > +                                 (u8 *)&lastblock);
> > +       le128_xor(&lastblock, &rctx->tcur, &lastblock);
> > +       /* Write combined block to dst as 2nd last cipherblock */
> > +       sg_pcopy_from_buffer(req->dst, sg_nents(req->dst), &lastblock,
> > +                            XTS_BLOCK_SIZE,
> > +                            req->cryptlen - XTS_BLOCK_SIZE);
> > +
> > +       /* Fix up original request length */
> > +       req->cryptlen += rctx->rem_bytes;
> > +       return;
> >  }
> >
> > -static int xor_tweak_post(struct skcipher_request *req)
> > +static void decrypt_finish_cts(struct skcipher_request *req)
> >  {
> > -       return xor_tweak(req, true);
> > +       struct rctx *rctx = skcipher_request_ctx(req);
> > +       /* Not a multiple of cipher blocksize, need CTS applied */
> > +       struct priv *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
> > +       le128 tnext, lastblock, lastctext;
> > +
> > +       /* Handle last 2 (partial) blocks - apply Cipher Text Stealing */
> > +
> > +       /* Copy last full ciphertext block to buffer  */
> > +       sg_pcopy_to_buffer(req->src, sg_nents(req->src), &lastblock,
> > +                          XTS_BLOCK_SIZE, req->cryptlen);
> > +       /* Decrypt last full block using *next* tweak */
> > +       gf128mul_x_ble(&tnext, &rctx->tcur);
> > +       le128_xor(&lastblock, &tnext, &lastblock);
> > +       crypto_cipher_decrypt_one(ctx->base, (u8 *)&lastblock,
> > +                                 (u8 *)&lastblock);
> > +       le128_xor(&lastblock, &tnext, &lastblock);
> > +       /* Save last ciphertext bytes, next step may overwrite!! */
> > +       sg_pcopy_to_buffer(req->src, sg_nents(req->src), &lastctext,
> > +                          rctx->rem_bytes, req->cryptlen + XTS_BLOCK_SIZE);
> > +       /* Copy first rem_bytes of this ptext as last partial block */
> > +       sg_pcopy_from_buffer(req->dst, sg_nents(req->dst), &lastblock,
> > +                            rctx->rem_bytes,
> > +                            req->cryptlen + XTS_BLOCK_SIZE);
> > +       /*
> > +        * Copy last remaining bytes of "plaintext" to combine buffer,
> > +        * replacing part of the ciphertext
> > +        */
> > +       memcpy(&lastblock, &lastctext, rctx->rem_bytes);
> > +       /* XTS decrypt the combined block */
> > +       le128_xor(&lastblock, &rctx->tcur, &lastblock);
> > +       crypto_cipher_decrypt_one(ctx->base, (u8 *)&lastblock,
> > +                                 (u8 *)&lastblock);
> > +       le128_xor(&lastblock, &rctx->tcur, &lastblock);
> > +       /* Write combined block to dst as 2nd last plaintext block */
> > +       sg_pcopy_from_buffer(req->dst, sg_nents(req->dst), &lastblock,
> > +                            XTS_BLOCK_SIZE, req->cryptlen);
> > +
> > +       /* Fix up original request length */
> > +       req->cryptlen += rctx->rem_bytes + XTS_BLOCK_SIZE;
> > +       return;
> >  }
> >
> >  static void crypt_done(struct crypto_async_request *areq, int err)
> > @@ -139,9 +233,16 @@ static void crypt_done(struct crypto_async_request *areq, int err)
> >
> >         if (!err) {
> >                 struct rctx *rctx = skcipher_request_ctx(req);
> > +               le128 t;
> >
> >                 rctx->subreq.base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
> > -               err = xor_tweak_post(req);
> > +               err = xor_tweak_post(req, &t);
> > +
> > +               if (unlikely(!err && rctx->rem_bytes)) {
> > +                       rctx->is_encrypt ?
> > +                         encrypt_finish_cts(req) :
> > +                         decrypt_finish_cts(req);
> > +               }
> >         }
> >
> >         skcipher_request_complete(req, err);
> > @@ -167,10 +268,45 @@ static int encrypt(struct skcipher_request *req)
> >         struct rctx *rctx = skcipher_request_ctx(req);
> >         struct skcipher_request *subreq = &rctx->subreq;
> >
> > +       /* IEEE P1619 does not allow less data than block cipher blocksize */
> > +       if (unlikely(req->cryptlen < XTS_BLOCK_SIZE))
> > +               return -EINVAL;
> > +
> >         init_crypt(req);
> > -       return xor_tweak_pre(req) ?:
> > +
> > +       /* valid bytes in last crypto block */
> > +       rctx->rem_bytes = req->cryptlen & (XTS_BLOCK_SIZE - 1);
> > +       if (unlikely(rctx->rem_bytes)) {
> > +               /* Not a multiple of cipher blocksize, need CTS applied */
> > +               int err = 0;
> > +
> > +               /* First process all *full* cipher blocks */
> > +               req->cryptlen -= rctx->rem_bytes;
> > +               subreq->cryptlen -= rctx->rem_bytes;
> > +               err = xor_tweak_pre(req, &rctx->tcur);
> > +               if (err)
> > +                       goto encrypt_exit;
> > +               rctx->is_encrypt = 1;
> > +               err = crypto_skcipher_encrypt(subreq);
> > +               if (err)
> > +                       goto encrypt_exit;
> > +               err = xor_tweak_post(req, &rctx->tcur);
> > +               if (err)
> > +                       goto encrypt_exit;
> > +
> > +               encrypt_finish_cts(req);
> > +               return 0;
> > +
> > +encrypt_exit:
> > +               /* Fix up original request length */
> > +               req->cryptlen += rctx->rem_bytes;
> > +               return err;
> > +       }
> > +
> > +       /* Multiple of cipher blocksize, no CTS required */
> > +       return xor_tweak_pre(req, &rctx->tcur) ?:
> >                 crypto_skcipher_encrypt(subreq) ?:
> > -               xor_tweak_post(req);
> > +               xor_tweak_post(req, &rctx->tcur);
> >  }
> >
> >  static int decrypt(struct skcipher_request *req)
> > @@ -178,10 +314,50 @@ static int decrypt(struct skcipher_request *req)
> >         struct rctx *rctx = skcipher_request_ctx(req);
> >         struct skcipher_request *subreq = &rctx->subreq;
> >
> > +       /* IEEE P1619 does not allow less data than block cipher blocksize */
> > +       if (unlikely(req->cryptlen < XTS_BLOCK_SIZE))
> > +               return -EINVAL;
> > +
> >         init_crypt(req);
> > -       return xor_tweak_pre(req) ?:
> > +
> > +       /* valid bytes in last crypto block */
> > +       rctx->rem_bytes = req->cryptlen & (XTS_BLOCK_SIZE - 1);
> > +       if (unlikely(rctx->rem_bytes)) {
> > +               int err = 0;
> > +
> > +               /* First process all but the last(!) full cipher blocks */
> > +               req->cryptlen -= rctx->rem_bytes + XTS_BLOCK_SIZE;
> > +               subreq->cryptlen -= rctx->rem_bytes + XTS_BLOCK_SIZE;
> > +               /* May not have any full blocks to process here */
> > +               if (req->cryptlen) {
> > +                       err = xor_tweak_pre(req, &rctx->tcur);
> > +                       if (err)
> > +                               goto decrypt_exit;
> > +                       rctx->is_encrypt = 0;
> > +                       err = crypto_skcipher_decrypt(subreq);
> > +                       if (err)
> > +                               goto decrypt_exit;
> > +                       err = xor_tweak_post(req, &rctx->tcur);
> > +                       if (err)
> > +                               goto decrypt_exit;
> > +               } else {
> > +                       /* Start from initial tweak */
> > +                       rctx->tcur = rctx->t;
> > +               }
> > +
> > +               decrypt_finish_cts(req);
> > +               return 0;
> > +
> > +decrypt_exit:
> > +               /* Fix up original request length */
> > +               req->cryptlen += rctx->rem_bytes + XTS_BLOCK_SIZE;
> > +               return err;
> > +       }
> > +
> > +       /* Multiple of cipher blocksize, no CTS required */
> > +       return xor_tweak_pre(req, &rctx->tcur) ?:
> >                 crypto_skcipher_decrypt(subreq) ?:
> > -               xor_tweak_post(req);
> > +               xor_tweak_post(req, &rctx->tcur);
> >  }
> >
> >  static int init_tfm(struct crypto_skcipher *tfm)
> > @@ -191,6 +367,7 @@ static int init_tfm(struct crypto_skcipher *tfm)
> >         struct priv *ctx = crypto_skcipher_ctx(tfm);
> >         struct crypto_skcipher *child;
> >         struct crypto_cipher *tweak;
> > +       struct crypto_cipher *base;
> >
> >         child = crypto_spawn_skcipher(&ictx->spawn);
> >         if (IS_ERR(child))
> > @@ -206,6 +383,16 @@ static int init_tfm(struct crypto_skcipher *tfm)
> >
> >         ctx->tweak = tweak;
> >
> > +       base = crypto_alloc_cipher(ictx->name, 0, 0);
> > +       if (IS_ERR(base)) {
> > +               crypto_free_skcipher(ctx->child);
> > +               crypto_free_cipher(ctx->tweak);
> > +               return PTR_ERR(base);
> > +       }
> > +
> > +       ctx->base = base;
> > +
> > +       /* struct rctx expanded by sub cipher request size! */
> >         crypto_skcipher_set_reqsize(tfm, crypto_skcipher_reqsize(child) +
> >                                          sizeof(struct rctx));
> >
> > @@ -218,6 +405,7 @@ static void exit_tfm(struct crypto_skcipher *tfm)
> >
> >         crypto_free_skcipher(ctx->child);
> >         crypto_free_cipher(ctx->tweak);
> > +       crypto_free_cipher(ctx->base);
> >  }
> >
> >  static void free(struct skcipher_instance *inst)
> > @@ -314,11 +502,12 @@ static int create(struct crypto_template *tmpl, struct rtattr
> **tb)
> >
> >         inst->alg.base.cra_flags = alg->base.cra_flags & CRYPTO_ALG_ASYNC;
> >         inst->alg.base.cra_priority = alg->base.cra_priority;
> > -       inst->alg.base.cra_blocksize = XTS_BLOCK_SIZE;
> > +       inst->alg.base.cra_blocksize = 1;
> 
> I don't think this is necessary or correct.
> 
> >         inst->alg.base.cra_alignmask = alg->base.cra_alignmask |
> >                                        (__alignof__(u64) - 1);
> >
> >         inst->alg.ivsize = XTS_BLOCK_SIZE;
> > +       inst->alg.chunksize = XTS_BLOCK_SIZE;
> 
> ... and you don't need this if you drop the above change.
> 
> >         inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(alg) * 2;
> >         inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(alg) * 2;
> >
> > --
> > 1.8.3.1

Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com





[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux