> -----Original Message----- > From: Jason A. Donenfeld <Jason@xxxxxxxxx> > Sent: Wednesday, October 12, 2022 7:36 PM > To: Elliott, Robert (Servers) <elliott@xxxxxxx> > Cc: herbert@xxxxxxxxxxxxxxxxxxx; davem@xxxxxxxxxxxxx; > tim.c.chen@xxxxxxxxxxxxxxx; ap420073@xxxxxxxxx; ardb@xxxxxxxxxx; linux- > crypto@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit > > On Wed, Oct 12, 2022 at 04:59:21PM -0500, Robert Elliott wrote: > > Use a common macro name (FPU_BYTES) for the limit of the number of bytes > > processed within kernel_fpu_begin and kernel_fpu_end rather than > > using SZ_4K (which is a signed value), or a magic value of 4096U. > > Not sure I like this very much. The whole idea is that this is variable > per algorithm, since not all algorithms have the same performance > characteristics. Good point. I noticed the powerpc aes, sha1, and sha256 modules include explanations of how their macros serving a similar purpose were calculated. arch/powerpc/crypto/aes-spe-glue.c: * MAX_BYTES defines the number of bytes that are allowed to be processed * between preempt_disable() and preempt_enable(). e500 cores can issue two * instructions per clock cycle using one 32/64 bit unit (SU1) and one 32 * bit unit (SU2). One of these can be a memory access that is executed via * a single load and store unit (LSU). XTS-AES-256 takes ~780 operations per * 16 byte block or 25 cycles per byte. Thus 768 bytes of input data * will need an estimated maximum of 20,000 cycles. Headroom for cache misses * included. Even with the low end model clocked at 667 MHz this equals to a * critical time window of less than 30us. The value has been chosen to * process a 512 byte disk block in one or a large 1400 bytes IPsec network * packet in two runs. #define MAX_BYTES 768 and arch/powerpc/crypto/sha1-spe-glue.c: * MAX_BYTES defines the number of bytes that are allowed to be processed * between preempt_disable() and preempt_enable(). SHA1 takes ~1000 * operations per 64 bytes. e500 cores can issue two arithmetic instructions * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2). * Thus 2KB of input data will need an estimated maximum of 18,000 cycles. * Headroom for cache misses included. Even with the low end model clocked * at 667 MHz this equals to a critical time window of less than 27us. #define MAX_BYTES 2048 arch/powerpc/crypto/sha256-spe-glue.c: * MAX_BYTES defines the number of bytes that are allowed to be processed * between preempt_disable() and preempt_enable(). SHA256 takes ~2,000 * operations per 64 bytes. e500 cores can issue two arithmetic instructions * per clock cycle using one 32/64 bit unit (SU1) and one 32 bit unit (SU2). * Thus 1KB of input data will need an estimated maximum of 18,000 cycles. * Headroom for cache misses included. Even with the low end model clocked * at 667 MHz this equals to a critical time window of less than 27us. #define MAX_BYTES 1024 Perhaps we should declare a time goal like "30 us," measure the actual speed of each algorithm with a tcrypt speed test, and calculate the nominal value assuming some slow x86 CPU core speed? That could be further adjusted at run-time based on the supposed minimum CPU frequency (e.g., as reported in /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq). If values less than 4 KiB are necessary (e.g., like the powerpc values), that will require changes in all the modules using skcipher_walk too. > So in that sense, it's better to put this close to > where it's actually used, rather than somewhere at the top of the file. > When you do that, it makes it seem like "FPU_BYTES" is some universal > constant, which of course it isn't. > > Instead, declare this as an untyped enum value within the function. Many of these modules use the same value for both an _update and a _finit function (usually pretty close to each other). Is it important to avoid replication?