On Tue, Dec 07, 2021, Lai Jiangshan wrote: > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index b70b36734bc0..0cb2c52377c8 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -252,23 +252,26 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > unsigned pte_access, unsigned pte_pkey, > unsigned pfec) > { > - int cpl = static_call(kvm_x86_get_cpl)(vcpu); > unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu); > > /* > - * If CPL < 3, SMAP prevention are disabled if EFLAGS.AC = 1. > + * If explicit supervisor accesses, SMAP is disabled Slight reword, and each clause can fit on one line. * For explicit supervisor accesses, SMAP is disabled if EFLAGS.AC = 1. * * For implicit supervisor accesses, SMAP cannot be overridden. > + * if EFLAGS.AC = 1. > * > - * If CPL = 3, SMAP applies to all supervisor-mode data accesses > - * (these are implicit supervisor accesses) regardless of the value > - * of EFLAGS.AC. > + * If implicit supervisor accesses, SMAP can not be disabled > + * regardless of the value EFLAGS.AC. > * > - * This computes (cpl < 3) && (rflags & X86_EFLAGS_AC), leaving > + * SMAP works on supervisor accesses only, and not_smap can > + * be set or not set when user access with neither has any bearing > + * on the result. This is quite jumbled, I'd just drop it entirely, the interesting bits are the rules for implicit vs. explicit and the blurb below that describes the magic. > + * > + * This computes explicit_access && (rflags & X86_EFLAGS_AC), leaving Too many &&, the logic below is a bitwise &, not a logical &&. > * the result in X86_EFLAGS_AC. We then insert it in place of > * the PFERR_RSVD_MASK bit; this bit will always be zero in pfec, > * but it will be one in index if SMAP checks are being overridden. > * It is important to keep this branchless. Heh, so important that it incurs multiple branches and possible VMREADs in vmx_get_cpl() and vmx_get_rflags(). And before static_call, multiple retpolines to boot. Probably a net win now as only the first permission_fault() check for a given VM-Exit be penalized, but the comment is amusing nonetheless. > */ > - unsigned long not_smap = (cpl - 3) & (rflags & X86_EFLAGS_AC); > + u32 not_smap = (rflags & X86_EFLAGS_AC) & vcpu->arch.explicit_access; I really, really dislike shoving this into vcpu->arch. I'd much prefer to make this a property of the access, even if that means adding another param or doing something gross with @access (@pfec here). > int index = (pfec >> 1) + > (not_smap >> (X86_EFLAGS_AC_BIT - PFERR_RSVD_BIT + 1)); > bool fault = (mmu->permissions[index] >> pte_access) & 1;