On Wed, May 29, 2019 at 08:10:56PM +0300, Iuliana Prodan wrote: > The generic GCM driver should ensure that whatever it passes into > scatterlists is safe for non-cache coherent DMA. > The issue was seen while running GCM on CAAM driver. But, since CAAM > does not support GHASH on i.MX6, only CTR skcipher part of the GCM is > offloaded. > The skcipher request received by CAAM has req->src pointing to > auth_tag[16] and req->iv pointing to iv[16]. Problem is that when > the iv is updated (crypto API requires skcipher implementations to > update the IV with the last ciphertext block) is written in iv[16], > which is on the same cacheline as auth_tag[16] that was previously > DMA mapped. > Solution is to use a pointer, aligned to cache line, instead of auth_tag > buffer, for encryption/decryption and then free it on completion. > > Link: https://lore.kernel.org/linux-crypto/20190208114459.5nixe76xmmkhur75@xxxxxxxxxxxxxxxxxxx/ > Cc: <stable@xxxxxxxxxxxxxxx> # v4.19+ > Fixes: adcbc688fe2f ("crypto: gcm - Convert to new AEAD interface") > Suggested-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > Signed-off-by: Iuliana Prodan <iuliana.prodan@xxxxxxx> > > --- > I've checked the reproducibility of this issue starting with 4.19.y. > --- > crypto/gcm.c | 26 +++++++++++++++++--------- > include/crypto/gcm.h | 1 + > 2 files changed, 18 insertions(+), 9 deletions(-) > > diff --git a/crypto/gcm.c b/crypto/gcm.c > index 33f45a9..53e3ce5 100644 > --- a/crypto/gcm.c > +++ b/crypto/gcm.c > @@ -66,7 +66,7 @@ struct crypto_gcm_ghash_ctx { > > struct crypto_gcm_req_priv_ctx { > u8 iv[16]; > - u8 auth_tag[16]; > + u8 *auth_tag; > u8 iauth_tag[16]; > struct scatterlist src[3]; > struct scatterlist dst[3]; > @@ -177,19 +177,23 @@ static void crypto_gcm_init_common(struct aead_request *req) > __be32 counter = cpu_to_be32(1); > struct scatterlist *sg; > > - memset(pctx->auth_tag, 0, sizeof(pctx->auth_tag)); > + /* > + * kzalloc alignment is at least the cache line size > + * for non-cache coherent architectures. > + */ > + pctx->auth_tag = kzalloc(GCM_MAX_AUTH_SIZE, GFP_KERNEL); > memcpy(pctx->iv, req->iv, GCM_AES_IV_SIZE); > memcpy(pctx->iv + GCM_AES_IV_SIZE, &counter, 4); > > sg_init_table(pctx->src, 3); > - sg_set_buf(pctx->src, pctx->auth_tag, sizeof(pctx->auth_tag)); > + sg_set_buf(pctx->src, pctx->auth_tag, GCM_MAX_AUTH_SIZE); > sg = scatterwalk_ffwd(pctx->src + 1, req->src, req->assoclen); > if (sg != pctx->src + 1) > sg_chain(pctx->src, 2, sg); > > if (req->src != req->dst) { > sg_init_table(pctx->dst, 3); > - sg_set_buf(pctx->dst, pctx->auth_tag, sizeof(pctx->auth_tag)); > + sg_set_buf(pctx->dst, pctx->auth_tag, GCM_MAX_AUTH_SIZE); > sg = scatterwalk_ffwd(pctx->dst + 1, req->dst, req->assoclen); > if (sg != pctx->dst + 1) > sg_chain(pctx->dst, 2, sg); > @@ -208,9 +212,8 @@ static void crypto_gcm_init_crypt(struct aead_request *req, > dst = req->src == req->dst ? pctx->src : pctx->dst; > > skcipher_request_set_tfm(skreq, ctx->ctr); > - skcipher_request_set_crypt(skreq, pctx->src, dst, > - cryptlen + sizeof(pctx->auth_tag), > - pctx->iv); > + skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen + > + GCM_MAX_AUTH_SIZE, pctx->iv); > } > > static inline unsigned int gcm_remain(unsigned int len) > @@ -440,6 +443,7 @@ static int gcm_enc_copy_hash(struct aead_request *req, u32 flags) > scatterwalk_map_and_copy(auth_tag, req->dst, > req->assoclen + req->cryptlen, > crypto_aead_authsize(aead), 1); > + kfree(auth_tag); > return 0; > } > > @@ -492,11 +496,15 @@ static int crypto_gcm_verify(struct aead_request *req) > u8 *iauth_tag = pctx->iauth_tag; > unsigned int authsize = crypto_aead_authsize(aead); > unsigned int cryptlen = req->cryptlen - authsize; > + int err; > > crypto_xor(auth_tag, iauth_tag, 16); > scatterwalk_map_and_copy(iauth_tag, req->src, > req->assoclen + cryptlen, authsize, 0); > - return crypto_memneq(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0; > + err = crypto_memneq(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0; > + kfree(auth_tag); > + > + return err; > } > So what about the other places that also pass an IV located next to the data, like crypto/ccm.c and crypto/adiantum.c? If we're actually going to make this a new API requirement, then we need to add a debugging option that makes the API detect this violation so that the other places can be fixed too. Also, doing a kmalloc() per requset is inefficient and very error-prone. In fact there are at least 3 bugs here: (1) not checking the return value, (2) incorrectly using GFP_KERNEL when it may be atomic context, and (3) not always freeing the memory. Why not use cacheline-aligned memory within the request context, so that a separate kmalloc() isn't needed? Also, did you consider whether there's any way to make the crypto API handle this automatically, so that all the individual users don't have to? - Eric