On Mon, Apr 20, 2020 at 01:57:11AM -0600, Jason A. Donenfeld wrote: > The initial Zinc patchset, after some mailing list discussion, contained > code to ensure that kernel_fpu_enable would not be kept on for more than > a PAGE_SIZE chunk, since it disables preemption. The choice of PAGE_SIZE > isn't totally scientific, but it's not a bad guess either, and it's > what's used in both the x86 poly1305 and blake2s library code already. > Unfortunately it appears to have been left out of the final patchset > that actually added the glue code. So, this commit adds back the > PAGE_SIZE chunking. > > Fixes: 84e03fa39fbe ("crypto: x86/chacha - expose SIMD ChaCha routine as library function") > Fixes: b3aad5bad26a ("crypto: arm64/chacha - expose arm64 ChaCha routine as library function") > Fixes: a44a3430d71b ("crypto: arm/chacha - expose ARM ChaCha routine as library function") > Fixes: f569ca164751 ("crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") > Fixes: a6b803b3ddc7 ("crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation") > Cc: Eric Biggers <ebiggers@xxxxxxxxxx> > Cc: Ard Biesheuvel <ardb@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> > --- > Eric, Ard - I'm wondering if this was in fact just an oversight in Ard's > patches, or if there was actually some later discussion in which we > concluded that the PAGE_SIZE chunking wasn't required, perhaps because > of FPU changes. If that's the case, please do let me know, in which case > I'll submit a _different_ patch that removes the chunking from x86 poly > and blake. I can't find any emails that would indicate that, but I might > be mistaken. > > arch/arm/crypto/chacha-glue.c | 16 +++++++++++++--- > arch/arm/crypto/poly1305-glue.c | 17 +++++++++++++---- > arch/arm64/crypto/chacha-neon-glue.c | 16 +++++++++++++--- > arch/arm64/crypto/poly1305-glue.c | 17 +++++++++++++---- > arch/x86/crypto/chacha_glue.c | 16 +++++++++++++--- > 5 files changed, 65 insertions(+), 17 deletions(-) I don't think you're missing anything. On x86, kernel_fpu_begin() and kernel_fpu_end() did get optimized in v5.2. But they still disable preemption, which is the concern here. > > diff --git a/arch/arm/crypto/chacha-glue.c b/arch/arm/crypto/chacha-glue.c > index 6fdb0ac62b3d..0e29ebac95fd 100644 > --- a/arch/arm/crypto/chacha-glue.c > +++ b/arch/arm/crypto/chacha-glue.c > @@ -91,9 +91,19 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes, > return; > } > > - kernel_neon_begin(); > - chacha_doneon(state, dst, src, bytes, nrounds); > - kernel_neon_end(); > + for (;;) { > + unsigned int todo = min_t(unsigned int, PAGE_SIZE, bytes); > + > + kernel_neon_begin(); > + chacha_doneon(state, dst, src, todo, nrounds); > + kernel_neon_end(); > + > + bytes -= todo; > + if (!bytes) > + break; > + src += todo; > + dst += todo; > + } > } > EXPORT_SYMBOL(chacha_crypt_arch); Seems this should just be a 'while' loop? while (bytes) { unsigned int todo = min_t(unsigned int, PAGE_SIZE, bytes); kernel_neon_begin(); chacha_doneon(state, dst, src, todo, nrounds); kernel_neon_end(); bytes -= todo; src += todo; dst += todo; } Likewise elsewhere in this patch. (For Poly1305, len >= POLY1305_BLOCK_SIZE at the beginning, so that could use a 'do' loop.) - Eric