On 25/09/2018 22:20, Sean Christopherson wrote: > One defense against L1TF in KVM is to always set the upper five bits > of the *legal* physical address in the SPTEs for non-present and > reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the > MMIO SPTE may overlap with the upper five bits that are being usurped > to defend against L1TF. To preserve the GFN, the bits of the GFN that > overlap with the repurposed bits are shifted left into the reserved > bits, i.e. the GFN in the SPTE will be split into high and low parts. > When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO > access, get_mmio_spte_gfn() unshifts the affected bits and restores > the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() > neglects to mask off the reserved bits in the SPTE that were used to > store the upper chunk of the GFN. As a result, KVM fails to detect > MMIO accesses whose GPA overlaps the repurprosed bits, which in turn > causes guest panics and hangs. > > Fix the bug by generating a mask that covers the lower chunk of the > GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The > alternative approach would be to explicitly zero the five reserved > bits that are used to store the upper chunk of the GFN, but that > requires additional run-time computation and makes an already-ugly > bit of code even more inscrutable. > > I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to > warn if GENMASK_ULL() generated a nonsensical value, but that seemed > silly since that would mean a system that supports VMX has less than > 18 bits of physical address space... > > Reported-by: Sakari Ailus <sakari.ailus@xxxxxx> > Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") > Cc: Junaid Shahid <junaids@xxxxxxxxxx> > Cc: Jim Mattson <jmattson@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- > > v2: Changed the mask to cover only the GFN and tweaked variable > names to better reflect this behavior [Junaid Shahid]. > > I didn't include Jim and Junaid's "Reviewed-by" tags in the changelog > because v2 reworks the calculation of the mask, which is the part of > this change that I'm most likely to screw up (math is hard). I pasted > them below in case you feel it's appropriate to keep them. > > Reviewed-by: Junaid Shahid <junaids@xxxxxxxxxx> > Reviewed-by: Jim Mattson <jmattson@xxxxxxxxxx> > > arch/x86/kvm/mmu.c | 24 ++++++++++++++++++++---- > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index a282321329b5..a04084cb4b7b 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -249,6 +249,17 @@ static u64 __read_mostly shadow_nonpresent_or_rsvd_mask; > */ > static const u64 shadow_nonpresent_or_rsvd_mask_len = 5; > > +/* > + * In some cases, we need to preserve the GFN of a non-present or reserved > + * SPTE when we usurp the upper five bits of the physical address space to > + * defend against L1TF, e.g. for MMIO SPTEs. To preserve the GFN, we'll > + * shift bits of the GFN that overlap with shadow_nonpresent_or_rsvd_mask > + * left into the reserved bits, i.e. the GFN in the SPTE will be split into > + * high and low parts. This mask covers the lower bits of the GFN. > + */ > +static u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; > + > + > static void mmu_spte_set(u64 *sptep, u64 spte); > static union kvm_mmu_page_role > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); > @@ -357,9 +368,7 @@ static bool is_mmio_spte(u64 spte) > > static gfn_t get_mmio_spte_gfn(u64 spte) > { > - u64 mask = generation_mmio_spte_mask(MMIO_GEN_MASK) | shadow_mmio_mask | > - shadow_nonpresent_or_rsvd_mask; > - u64 gpa = spte & ~mask; > + u64 gpa = spte & shadow_nonpresent_or_rsvd_lower_gfn_mask; > > gpa |= (spte >> shadow_nonpresent_or_rsvd_mask_len) > & shadow_nonpresent_or_rsvd_mask; > @@ -423,6 +432,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); > > static void kvm_mmu_reset_all_pte_masks(void) > { > + u8 low_phys_bits; > + > shadow_user_mask = 0; > shadow_accessed_mask = 0; > shadow_dirty_mask = 0; > @@ -437,12 +448,17 @@ static void kvm_mmu_reset_all_pte_masks(void) > * appropriate mask to guard against L1TF attacks. Otherwise, it is > * assumed that the CPU is not vulnerable to L1TF. > */ > + low_phys_bits = boot_cpu_data.x86_phys_bits; > if (boot_cpu_data.x86_phys_bits < > - 52 - shadow_nonpresent_or_rsvd_mask_len) > + 52 - shadow_nonpresent_or_rsvd_mask_len) { > shadow_nonpresent_or_rsvd_mask = > rsvd_bits(boot_cpu_data.x86_phys_bits - > shadow_nonpresent_or_rsvd_mask_len, > boot_cpu_data.x86_phys_bits - 1); > + low_phys_bits -= shadow_nonpresent_or_rsvd_mask_len; > + } > + shadow_nonpresent_or_rsvd_lower_gfn_mask = > + GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT); > } > > static int is_cpuid_PSE36(void) > Queued, thanks. Paolo