On Tue, 2019-04-23 at 08:08 -0700, Sean Christopherson wrote: > On Mon, Apr 22, 2019 at 06:57:01PM -0700, Huang, Kai wrote: > > On Mon, 2019-04-22 at 09:39 -0700, Sean Christopherson wrote: > > > On Tue, Apr 16, 2019 at 09:19:48PM +1200, Kai Huang wrote: > > > > With both Intel MKTME and AMD SME/SEV, physical address bits are reduced > > > > due to several high bits of physical address are repurposed for memory > > > > encryption. To honor such behavior those repurposed bits are reduced from > > > > cpuinfo_x86->x86_phys_bits for both Intel MKTME and AMD SME/SEV, thus > > > > boot_cpu_data.x86_phys_bits doesn't hold physical address bits reported > > > > by CPUID anymore. > > > > > > This neglects to mention the most relevant tidbit of information in terms > > > of justification for this patch: the number of bits stolen for MKTME is > > > programmed by BIOS, i.e. bits may be repurposed for MKTME regardless of > > > kernel support. > > > > I can add BIOS part. But the key issue is kernel adjusts > > boot_cpu_data.x86_phys_bits, isn't it? > > > > If kernel doesn't adjust boot_cpu_data.x86_phys_bits then this patch > > theoretically is not needed? > > True, but the context matters, e.g. readers might wonder why this code > doesn't simply check a feature flag to see if MKTME is enabled. Knowing > that PA bits can be repurposed regardless of (full) kernel support is just > as important as knowing that the kernel adjusts boot_cpu_data.x86_phys_bits. Fine. [...] > > > > > > > + > > > > > > > > static void mmu_spte_set(u64 *sptep, u64 spte); > > > > static union kvm_mmu_page_role > > > > @@ -303,6 +319,34 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value) > > > > } > > > > EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); > > > > > > > > +void kvm_set_mmio_spte_mask(void) > > > > > > Moving this to mmu.c makes sense, but calling it from kvm_arch_init() is > > > silly. The current call site is immediately after kvm_mmu_module_init(), > > > I don't see any reason not to just move the call there. > > > > I don't know whether calling kvm_set_mmio_spte_mask() from kvm_arch_init() is > > silly but maybe there's histroy -- KVM calls kvm_mmu_set_mask_ptes() from > > kvm_arch_init() too, which may also be silly according to your judgement. I > > have no problem calling kvm_set_mmio_spte_mask() from kvm_mmu_module_init(), > > but IMHO this logic is irrelevant to this patch, and it's better to have a > > separate patch for this purpose if necessary? > > A separate patch would be fine, but I would do it as a prereq, i.e. move > the function first and modify it second. That would help review the > functional changes. I think I am still a little bit confused. Assuming you want to call kvm_set_mmio_spte_mask() from kvm_mmu_module_init(), right after kvm_mmu_reset_all_pte_masks() is called, does this change do any real fix related to this issue? Or exactly what are you suggesting if I am misunderstanding? Btw I think calling kvm_set_mmio_spte_mask() from kvm_arch_init() has benefits, since we can change to call it after kvm_x86_ops has been setup, and then we can do some kvm_x86_ops->xxx() if needed inside. And we also have kvm_mmu_set_mask_ptes(), which seems to be similar thing as kvm_set_mmio_spte_mask(). I don't see why we should call kvm_set_mmio_spte_mask() in kvm_mmu_module_init() but leave kvm_mmu_set_mask_ptes() outside. [...] > > > > > > > > {MK}TME provides a CPUID feature bit and MSRs to query support, and this > > > code only runs at module init so the overhead of a RDMSR is negligible. > > > > Yes can check against CPUID feature bit but not sure why it is better than CPU vendor. > > It helps readers understand the code flow, i.e. it's a mental cue that the > bit stealing only applies to MKTME. Checking for "Intel" could easiliy be > misinterpreted as "x86_phys_bits isn't accurate for Intel CPUs." OK. Fine to me. > > > > > + shadow_first_rsvd_bit = boot_cpu_data.x86_phys_bits; > > > > + else > > > > + shadow_first_rsvd_bit = kvm_get_cpuid_phys_bits(); > > > > > > If you rename the helper to be less specific and actually check for MKTME > > > support then the MKTME comment doesn't need to be as verbose, e.g.: > > > > > > static u8 kvm_get_shadow_phys_bits(void) > > > { > > > if (!<has MKTME> || > > > WARN_ON_ONCE(boot_cpu_data.extended_cpuid_level < 0x80000008)) > > > return boot_cpu_data.x86_phys_bits; > > > > Why do we need WARN_ON_ONCE here? > > Because KVM would essentially be consuming known bad data since we've > already established that 'x86_phys_bits' is wrong when MKTME is enabled. > I.e. we should never encounter a platform with MKTME but not CPUID leaf > 0x80000008. Can you make assumption that compiler won't do WARN_ON_ONCE part before checking !<has_mktme>? > > > I don't have problem using as you suggested, but I don't get why checking > > against CPU vendor is last resort? > > A few reasons of the top of my head: > > - CPU vendor is less precise, e.g. by checking for MKTME support KVM can > WARN if it's consuming known bad data. > > - Checking for features makes the code self-documenting to some extent. > > - Multiple vendors may support a feature, now or in the future. E.g. the > Hygon case is a great example. OK. > > > > > > > /* > > > * MKTME steals physical address bits for key IDs, but the key ID bits > > > * are not treated as reserved. x86_phys_bits is adjusted to account > > > * for the stolen bits, use CPUID.MAX_PA_WIDTH directly which reports > > > * the number of software-available bits irrespective of MKTME. > > > */ > > > return cpuid_eax(0x80000008) & 0xff; > > > } > > > > > > > + > > > > + /* > > > > + * Only Intel is impacted by L1TF, therefore for AMD and other x86 > > > > + * vendors L1TF mitigation is not needed. > > > > + * > > > > + * For Intel CPU, if it has 46 or less physical address bits, then set > > > > + * an appropriate mask to guard against L1TF attacks. Otherwise, it is > > > > * assumed that the CPU is not vulnerable to L1TF. > > > > */ > > > > + if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && > > > > > > Again, checking vendors is bad. Not employing a mitigation technique due > > > to an assumption about what processors are affected by a vulnerability is > > > even worse. > > > > Why, isn't it a fact that L1TF only impacts Intel CPUs? What's the benefit of > > employing a mitigation to CPUs that don't have L1TF issue? Such mitigation > > only makes performance worse, even not noticable. For example, why not > > employing such mitigation regardless to physical address bits at all? > > This particular mitigation has minimal effect on performance, e.g. adds > a few SHL/SHR/AND/OR operations, and that minimal overhead is incurred > regardless of whether or not reserved bits are set. I.e. KVM is paying > the penalty no matter what, so being extra paranoid is "free". > > > > The existing code makes the assumption regarding processors > > > with >46 bits of address space because a) no such processor existed before > > > the discovery of L1TF, and it's reasonable to assume hardware vendors > > > won't ship future processors with such an obvious vulnerability, and b) > > > hardcoding the number of reserved bits to set simplifies the code. > > > > Yes, but we cannot simply use 'shadow_phys_bits' to check against 46 anymore, > > right? > > > > For example, if AMD has 52 phys bits, but it reduces 5 or more bits, then > > current KVM code would employ l1tf mitigation, but actually it really > > shouldn't? > > What do you mean by "shouldn't"? Sure, it's not absolutely necessary, but > again setting the bits is free since adjusting the GPA is hardcoded into > mark_mmio_spte() and get_mmio_spte_gfn(). OK. I am not sure whether CPU has any difference when doing x << 5 and x << 0 but this is really trival so will do what you suggested. Or I found there's one CPU bug flag (software) X86_BUG_L1TF, do you think we should check whether boot_cpu_has(X86_BUG_L1TF), and don't setup shadow_nonpresent_or_rsvd_mask if it doesn't? > > > > > + (shadow_first_rsvd_bit < > > > > + 52 - shadow_nonpresent_or_rsvd_mask_len)) > > > > + need_l1tf = true; > > > > + else > > > > + need_l1tf = false; > > > > + > > > > low_phys_bits = boot_cpu_data.x86_phys_bits; > > > > - if (boot_cpu_data.x86_phys_bits < > > > > - 52 - shadow_nonpresent_or_rsvd_mask_len) { > > > > + shadow_nonpresent_or_rsvd_mask = 0; > > > > + if (need_l1tf) { > > > > shadow_nonpresent_or_rsvd_mask = > > > > rsvd_bits(boot_cpu_data.x86_phys_bits - > > > > shadow_nonpresent_or_rsvd_mask_len, > > > > > > This is broken, the reserved bits mask is being calculated with the wrong > > > number of physical bits. I think fixing this would eliminate the need for > > > the high_gpa_offset shenanigans. > > > > You are right. should use 'shadow_phys_bits' instead. Thanks. Let me think whether > > high_gpa_offset > > can be avoided. > > > > > > > > > @@ -4326,7 +4422,7 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, > > > > static void > > > > __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, > > > > struct rsvd_bits_validate *rsvd_check, > > > > - int maxphyaddr, int level, bool nx, bool gbpages, > > > > + int first_rsvd_bit, int level, bool nx, bool gbpages, > > > > bool pse, bool amd) > > > > > > Similar to the earlier comment regarding 'first', it's probably less > > > confusing overall to just leave this as 'maxphyaddr'. > > > > Both work for me. But maybe 'non_rsvd_maxphyaddr' is better? > > Maybe? My personal preference would be to stay with maxphyaddr. Fine to maxphyaddr. Btw, which one do you think is better, shadow_phys_bits, or more explicitly, shadow_non_rsvd_phys_bits? Thanks, -Kai