On Tue, Jan 15, 2019 at 11:46 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 1/15/19 4:44 AM, David Laight wrote: > > Once this is done it might be worth while adding a parameter to > > kernel_fpu_begin() to request the registers only when they don't > > need saving. > > This would benefit code paths where the gains are reasonable but not massive. > > > > The return value from kernel_fpu_begin() ought to indicate which > > registers are available - none, SSE, SSE2, AVX, AVX512 etc. > > So code can use an appropriate implementation. > > (I've not looked to see if this is already the case!) > > Yeah, it would be sane to have both a mask passed, and returned, say: > > got = kernel_fpu_begin(XFEATURE_MASK_AVX512, NO_XSAVE_ALLOWED); > > if (got == XFEATURE_MASK_AVX512) > do_avx_512_goo(); > else > do_integer_goo(); > > kernel_fpu_end(got) > > Then, kernel_fpu_begin() can actually work without even *doing* an XSAVE: > > /* Do we have to save state for anything in 'ask_mask'? */ > if (all_states_are_init(ask_mask)) > return ask_mask; > > Then kernel_fpu_end() just needs to zero out (re-init) the state, which > it can do with XRSTORS and a careful combination of XSTATE_BV and the > requested feature bitmap (RFBM). > > This is all just optimization, though. I don't think we'd ever want kernel_fpu_end() to restore anything, right? I'm a bit confused as to when this optimization would actually be useful. Jason Donenfeld has a rather nice API for this in his Zinc series. Jason, how is that coming?