RE: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason A. Donenfeld <Jason@xxxxxxxxx>
> Sent: Wednesday, October 12, 2022 7:36 PM
> To: Elliott, Robert (Servers) <elliott@xxxxxxx>
> Cc: herbert@xxxxxxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> tim.c.chen@xxxxxxxxxxxxxxx; ap420073@xxxxxxxxx; ardb@xxxxxxxxxx; linux-
> crypto@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit
> 
> On Wed, Oct 12, 2022 at 04:59:21PM -0500, Robert Elliott wrote:
> > Use a common macro name (FPU_BYTES) for the limit of the number of bytes
> > processed within kernel_fpu_begin and kernel_fpu_end rather than
> > using SZ_4K (which is a signed value), or a magic value of 4096U.
> 
> Not sure I like this very much. The whole idea is that this is variable
> per algorithm, since not all algorithms have the same performance
> characteristics. 

Good point.

I noticed the powerpc aes, sha1, and sha256 modules include explanations
of how their macros serving a similar purpose were calculated.

arch/powerpc/crypto/aes-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
 * between preempt_disable() and preempt_enable(). e500 cores can issue two
 * instructions per clock cycle using one 32/64 bit unit (SU1) and one 32
 * bit unit (SU2). One of these can be a memory access that is executed via
 * a single load and store unit (LSU). XTS-AES-256 takes ~780 operations per
 * 16 byte block or 25 cycles per byte. Thus 768 bytes of input data
 * will need an estimated maximum of 20,000 cycles. Headroom for cache misses
 * included. Even with the low end model clocked at 667 MHz this equals to a
 * critical time window of less than 30us. The value has been chosen to
 * process a 512 byte disk block in one or a large 1400 bytes IPsec network
 * packet in two runs.
#define MAX_BYTES 768

and arch/powerpc/crypto/sha1-spe-glue.c:
 * MAX_BYTES defines the number of bytes that are allowed to be processed
 * between preempt_disable() and preempt_enable(). SHA1 takes ~1000
 * operations per 64 bytes. e500 cores can issue two arithmetic instructions
 * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
 * Thus 2KB of input data will need an estimated maximum of 18,000 cycles.
 * Headroom for cache misses included. Even with the low end model clocked
 * at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 2048

arch/powerpc/crypto/sha256-spe-glue.c:
* MAX_BYTES defines the number of bytes that are allowed to be processed
 * between preempt_disable() and preempt_enable(). SHA256 takes ~2,000
 * operations per 64 bytes. e500 cores can issue two arithmetic instructions
 * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2).
 * Thus 1KB of input data will need an estimated maximum of 18,000 cycles.
 * Headroom for cache misses included. Even with the low end model clocked
 * at 667 MHz this equals to a critical time window of less than 27us.
#define MAX_BYTES 1024

Perhaps we should declare a time goal like "30 us," measure the actual
speed of each algorithm with a tcrypt speed test, and calculate the
nominal value assuming some slow x86 CPU core speed?

That could be further adjusted at run-time based on the supposed
minimum CPU frequency (e.g., as reported in
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).

If values less than 4 KiB are necessary (e.g., like the powerpc
values), that will require changes in all the modules using 
skcipher_walk too.

> So in that sense, it's better to put this close to
> where it's actually used, rather than somewhere at the top of the file.
> When you do that, it makes it seem like "FPU_BYTES" is some universal
> constant, which of course it isn't.
> 
> Instead, declare this as an untyped enum value within the function. 

Many of these modules use the same value for both an _update and
a _finit function (usually pretty close to each other). Is it
important to avoid replication?






[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux