Hi Thomas, On Wed, May 04, 2022 at 05:36:38PM +0200, Thomas Gleixner wrote: > But the only use case which utilizes FPU from hard interrupt context is > the random generator via add_randomness_...(). > > I did a benchmark of these functions, which invoke blake2s_update() > three times in a row, on a SKL-X and a ZEN3. The generic code and the > FPU accelerated code are pretty much on par vs. execution time of the > algorithm itself plus/minus noise. > > IOW, using the FPU blindly for this kind of computations is not > necessarily a good plan. I have no idea how these things are analyzed > and evaluated if at all. Maybe the crypto people can shed some light on > this. drivers/net/wireguard/{noise,cookie}.c makes pretty heavy use of BLAKE2s in hot paths where the FPU is already being used for other algorithms, and so there the save/restore is worth it (assuming restore finally works lazily). In benchmarks, the SIMD code made a real difference. But this presumably regards mix_pool_bytes() in the RNG. If it turns out that supporting the FPU in hard IRQ context is a major PITA, and the RNG is the only thing making use of it, then sure, drop hard IRQ context support for it. However... This may be unearthing a larger bug. Sebastian and I put in a decent amount of work during 5.18 to remove all calls to mix_pool_bytes() (and hence to blake2s_compress()) from add_interrupt_randomness(). Have a look: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/tree/drivers/char/random.c#n1289 It now accumulates in some per-CPU buffer, and then every 64 interrupts a worker runs that does the actual mix_pool_bytes() from kthread context. So the question is: what is still hitting mix_pool_bytes() from hard IRQ context? I'll investigate a bit and see. Jason