On Wed, Jan 09, 2019 at 12:47:33PM +0100, Sebastian Andrzej Siewior wrote: > After changing the argument of __raw_xsave_addr() from a mask to number > Dave suggested to check if it makes sense to do the same for > get_xsave_addr(). As it turns out it does. Only get_xsave_addr() needs > the mask to check if the requested feature is part of what is > support/saved and then uses the number again. The shift operation is > cheaper compared to "find last bit set". Also, the feature number uses > less opcode space compared to the mask :) > > Make get_xsave_addr() argument a xfeature number instead of mask and fix > up its callers. > As part of this use xfeature_nr and xfeature_mask consistently. Good! > This results in changes to the kvm code as: > feature -> xfeature_mask > index -> xfeature_nr > > Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > --- > arch/x86/include/asm/fpu/xstate.h | 4 ++-- > arch/x86/kernel/fpu/xstate.c | 23 +++++++++++------------ > arch/x86/kernel/traps.c | 2 +- > arch/x86/kvm/x86.c | 28 ++++++++++++++-------------- > arch/x86/mm/mpx.c | 6 +++--- > 5 files changed, 31 insertions(+), 32 deletions(-) > > diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h > index 48581988d78c7..fbe41f808e5d8 100644 > --- a/arch/x86/include/asm/fpu/xstate.h > +++ b/arch/x86/include/asm/fpu/xstate.h > @@ -46,8 +46,8 @@ extern void __init update_regset_xstate_info(unsigned int size, > u64 xstate_mask); > > void fpu__xstate_clear_all_cpu_caps(void); > -void *get_xsave_addr(struct xregs_state *xsave, int xstate); > -const void *get_xsave_field_ptr(int xstate_field); > +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr); > +const void *get_xsave_field_ptr(int xfeature_nr); > int using_compacted_format(void); > int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size); > int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned int offset, unsigned int size); > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index 0e759a032c1c7..d288e4d271b71 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -830,15 +830,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr) > * > * Inputs: > * xstate: the thread's storage area for all FPU data > - * xstate_feature: state which is defined in xsave.h (e.g. > - * XFEATURE_MASK_FP, XFEATURE_MASK_SSE, etc...) > + * xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP, > + * XFEATURE_SSE, etc...) > * Output: > * address of the state in the xsave area, or NULL if the > * field is not present in the xsave buffer. > */ > -void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature) > +void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr) > { > - int xfeature_nr; > + u64 xfeature_mask = 1ULL << xfeature_nr; You can paste directly BIT_ULL(xfeature_nr) where you need it in this function... > /* > * Do we even *have* xsave state? > */ > @@ -850,11 +850,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature) > * have not enabled. Remember that pcntxt_mask is > * what we write to the XCR0 register. > */ > - WARN_ONCE(!(xfeatures_mask & xstate_feature), > + WARN_ONCE(!(xfeatures_mask & xfeature_mask), ... and turn this into: WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr)) which is more readable than the AND of two variables which I had to re-focus my eyes to see the difference. :) Oh and this way, gcc generates better code by doing simply a BT directly: # arch/x86/kernel/fpu/xstate.c:852: WARN_ONCE(!(xfeatures_mask & BIT_ULL(xfeature_nr)), .loc 1 852 2 view .LVU258 movq xfeatures_mask(%rip), %rax # xfeatures_mask, tmp124 btq %rsi, %rax # xfeature_nr, tmp124 without first computing the shift into xfeature_mask: # arch/x86/kernel/fpu/xstate.c:841: u64 xfeature_mask = 1ULL << xfeature_nr; .loc 1 841 6 view .LVU249 movl %esi, %ecx # xfeature_nr, tmp120 movl $1, %ebp #, tmp105 salq %cl, %rbp # tmp120, xfeature_mask and then testing it: # arch/x86/kernel/fpu/xstate.c:853: WARN_ONCE(!(xfeatures_mask & xfeature_mask), .loc 1 853 2 view .LVU256 testq %rbp, xfeatures_mask(%rip) # xfeature_mask, xfeatures_mask movq %rdi, %rbx # xsave, xsave Otherwise a nice cleanup! Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.