Hi Thomas, On Thu, May 05, 2022 at 02:55:58AM +0200, Thomas Gleixner wrote: > > So if truly the only user of this is random.c as of 5.18 (is it? I'm > > assuming from a not very thorough survey...), and if the performance > > boost doesn't even exist, then yeah, I think it'd make sense to just get > > rid of it, and have kernel_fpu_usable() return false in those cases. > > > > I'll run some benchmarks on a little bit more hardware in representative > > cases and see. > > Find below a combo patch which makes use of strict softirq serialization > for the price of not supporting the hardirq FPU usage. Thanks, I'll give it a shot in the morning (3am) when trying to do a more realistic benchmark. But just as a synthetic thing, I ran the numbers in kBench900 and am getting: generic: 430 cycles per call ssse3: 315 cycles per call avx512: 277 cycles per call for a single call to the compression function, which is the most any of those mix_pool_bytes() calls do from add_{input,disk}_randomness(), on Tiger Lake, using RDPMC from kernel space. This _doesn't_ take into account the price of calling kernel_fpu_begin(). That's a little hard to bench synthetically by running it in a loop and taking medians because of the lazy restoration. But that's an indication anyway that I should be looking at the cost of the actual function as its running in random.c, rather than the synthetic test. Will keep this thread updated. Jason