On Mon, Dec 19, 2022 at 09:40:42PM -0800, Eric Biggers wrote: > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > Add a comment that explains what ghash_setkey() is doing, as it's hard > to understand otherwise. Also fix a broken hyperlink. > > Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx> > --- > arch/x86/crypto/ghash-clmulni-intel_asm.S | 2 +- > arch/x86/crypto/ghash-clmulni-intel_glue.c | 27 ++++++++++++++++++---- > 2 files changed, 24 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S > index 9dfeb4d31b92..257ed9446f3e 100644 > --- a/arch/x86/crypto/ghash-clmulni-intel_asm.S > +++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S > @@ -4,7 +4,7 @@ > * instructions. This file contains accelerated part of ghash > * implementation. More information about PCLMULQDQ can be found at: > * > - * http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/ > + * https://www.intel.com/content/dam/develop/external/us/en/documents/clmul-wp-rev-2-02-2014-04-20.pdf Since these things have a habbit if changing, we tend to prefer to host a copy on kernel.org somewhere (used to be bugzilla, but perhaps there's a better places these days). > * > * Copyright (c) 2009 Intel Corp. > * Author: Huang Ying <ying.huang@xxxxxxxxx> > diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c > index 9453b094bb3b..700ecaee9a08 100644 > --- a/arch/x86/crypto/ghash-clmulni-intel_glue.c > +++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c > @@ -60,16 +60,35 @@ static int ghash_setkey(struct crypto_shash *tfm, > if (keylen != GHASH_BLOCK_SIZE) > return -EINVAL; > > - /* perform multiplication by 'x' in GF(2^128) */ > + /* > + * GHASH maps bits to polynomial coefficients backwards, which makes it > + * hard to implement. But it can be shown that the GHASH multiplication > + * > + * D * K (mod x^128 + x^7 + x^2 + x + 1) > + * > + * (where D is a data block and K is the key) is equivalent to: > + * > + * bitreflect(D) * bitreflect(K) * x^(-127) > + * (mod x^128 + x^127 + x^126 + x^121 + 1) > + * > + * So, the code below precomputes: > + * > + * bitreflect(K) * x^(-127) (mod x^128 + x^127 + x^126 + x^121 + 1) > + * > + * ... but in Montgomery form (so that Montgomery multiplication can be > + * used), i.e. with an extra x^128 factor, which means actually: > + * > + * bitreflect(K) * x (mod x^128 + x^127 + x^126 + x^121 + 1) > + * > + * The within-a-byte part of bitreflect() cancels out GHASH's built-in > + * reflection, and thus bitreflect() is actually a byteswap. > + */ Whee, thanks, that was indeed entirely non-obvious.