Re: [PATCH 0/5] crypto: arm64 - disable NEON across scatterwalk API calls

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Sat, 2 Dec 2017 14:54:07 +0100

On Sat, Dec 02, 2017 at 09:11:46AM +0000, Ard Biesheuvel wrote:
> On 2 December 2017 at 09:01, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Fri, Dec 01, 2017 at 09:19:22PM +0000, Ard Biesheuvel wrote:
> >> Note that the remaining crypto drivers simply operate on fixed buffers, so
> >> while the RT crowd may still feel the need to disable those (and the ones
> >> below as well, perhaps), they don't call back into the crypto layer like
> >> the ones updated by this series, and so there's no room for improvement
> >> there AFAICT.
> >
> > Do these other drivers process all the blocks fed to them in one go
> > under a single NEON section, or do they do a single fixed block per
> > NEON invocation?
> 
> They consume the entire input in a single go, yes. But making it more
> granular than that is going to hurt performance, unless we introduce
> some kind of kernel_neon_yield(), which does a end+begin but only if
> the task is being scheduled out.

A little something like this:

https://lkml.kernel.org/r/20171201113235.6tmkwtov5cg2locv@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

> For example, the SHA256 keeps 256 bytes of round constants in NEON
> registers, and reloading those from memory for each 64 byte block of
> input is going to be noticeable. The same applies to the AES code
> (although the numbers are slightly different)

Quite. We could augment the above function with a return value that says
if we actually did a end/begin and registers were clobbered.