Re: [patch 3/3] x86/fpu: Make FPU protection more robust

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Thu, 05 May 2022 03:21:43 +0200

Jason,

On Thu, May 05 2022 at 03:11, Jason A. Donenfeld wrote:
> On Thu, May 05, 2022 at 02:55:58AM +0200, Thomas Gleixner wrote:
>> > So if truly the only user of this is random.c as of 5.18 (is it? I'm
>> > assuming from a not very thorough survey...), and if the performance
>> > boost doesn't even exist, then yeah, I think it'd make sense to just get
>> > rid of it, and have kernel_fpu_usable() return false in those cases.
>> >
>> > I'll run some benchmarks on a little bit more hardware in representative
>> > cases and see.
>> 
>> Find below a combo patch which makes use of strict softirq serialization
>> for the price of not supporting the hardirq FPU usage. 
>
> Thanks, I'll give it a shot in the morning (3am) when trying to do a
> more realistic benchmark. But just as a synthetic thing, I ran the
> numbers in kBench900 and am getting:
>
>      generic:    430 cycles per call
>        ssse3:    315 cycles per call
>       avx512:    277 cycles per call
>
> for a single call to the compression function, which is the most any of
> those mix_pool_bytes() calls do from add_{input,disk}_randomness(), on
> Tiger Lake, using RDPMC from kernel space.

I'm well aware of the difference between synthetic benchmarks and real
world scenarios and with the more in depth instrumentation of these
things I'm even more concerned that the difference is underestimated.

> This _doesn't_ take into account the price of calling kernel_fpu_begin().
> That's a little hard to bench synthetically by running it in a loop and
> taking medians because of the lazy restoration. But that's an indication
> anyway that I should be looking at the cost of the actual function as
> its running in random.c, rather than the synthetic test. Will keep this
> thread updated.

Appreciated.

Thanks,

        tglx