Re: [PATCH] kvm: x86: Fix several SPTE mask calculation errors caused by MKTME

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 23, 2019 at 04:38:17PM -0700, Huang, Kai wrote:
> On Tue, 2019-04-23 at 08:08 -0700, Sean Christopherson wrote:
> > On Mon, Apr 22, 2019 at 06:57:01PM -0700, Huang, Kai wrote:
> > > I don't know whether calling kvm_set_mmio_spte_mask() from
> > > kvm_arch_init() is silly but maybe there's histroy -- KVM calls
> > > kvm_mmu_set_mask_ptes() from kvm_arch_init() too, which may also be silly
> > > according to your judgement. I have no problem calling
> > > kvm_set_mmio_spte_mask() from kvm_mmu_module_init(), but IMHO this logic
> > > is irrelevant to this patch, and it's better to have a separate patch for
> > > this purpose if necessary?
> > 
> > A separate patch would be fine, but I would do it as a prereq, i.e. move
> > the function first and modify it second.  That would help review the
> > functional changes.
> 
> I think I am still a little bit confused. Assuming you want to call
> kvm_set_mmio_spte_mask() from kvm_mmu_module_init(), right after
> kvm_mmu_reset_all_pte_masks() is called, does this change do any real fix
> related to this issue? Or exactly what are you suggesting if I am
> misunderstanding?

Moving kvm_set_mmio_spte_mask() to mmu.c allows it to be a static function
local to mmu.c.  AFAICT the original decision to define and call it in
x86.c was completely arbitrary.  Now that it depends on variables in mmu.c
then it makes sense to move it.

> Btw I think calling kvm_set_mmio_spte_mask() from kvm_arch_init() has
> benefits, since we can change to call it after kvm_x86_ops has been setup,
> and then we can do some kvm_x86_ops->xxx() if needed inside.

Alternatively, @ops could be passed to kvm_mmu_module_init().  Regardless,
using 'struct kvm_x86_ops' for unadjusting the phys bits is probably
overkill, but it might make sense if we're doing RDMSR to process MKTME
information.  I guess I don't have a strong opinion :-)

> And we also have kvm_mmu_set_mask_ptes(), which seems to be similar thing as
> kvm_set_mmio_spte_mask(). I don't see why we should call
> kvm_set_mmio_spte_mask() in kvm_mmu_module_init() but leave
> kvm_mmu_set_mask_ptes() outside. 

True, although kvm_mmu_set_mask_ptes() is pretty clear cut since the call
from VMX when EPT is enabled consumes information that is already available
in VMX-specific code, e.g. A/D bit enabling, whereas the MKTME handling is
new one-off code no matter where it lands.

> > Because KVM would essentially be consuming known bad data since we've
> > already established that 'x86_phys_bits' is wrong when MKTME is enabled.
> > I.e. we should never encounter a platform with MKTME but not CPUID leaf
> > 0x80000008.
> 
> Can you make assumption that compiler won't do WARN_ON_ONCE part before
> checking !<has_mktme>?

Absolutely, it's part of the C99 spec[1] under '6.5.14 Logical OR operator'.
Without strict left-to-right evaluation, things like 'if (ptr && ptr->deref)'
and 'if (!ptr || ptr->invalid)' fall apart.

[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

> > What do you mean by "shouldn't"?  Sure, it's not absolutely necessary, but
> > again setting the bits is free since adjusting the GPA is hardcoded into
> > mark_mmio_spte() and get_mmio_spte_gfn().
> 
> OK. I am not sure whether CPU has any difference when doing x << 5 and x << 0
> but this is really trival so will do what you suggested.

'x << 5' vs. 'x << 0' doesn't matter, but ensuring the shift amount is a
compile-time constant absolutely does matter since it allows the compiler
to emit a single 'SH{L,R} reg, imm' instead of having to load CL to do
'SH{L,R} reg, CL'.

E.g. setting the reserved bits compiles to:

constant:
   0x0000000000025ce6 <+86>:    and    %rsi,%rcx
   0x0000000000025cf0 <+96>:    shl    $0x5,%rcx
   0x0000000000025d38 <+168>:   or     %rax,%rcx
   0x0000000000025d43 <+179>:   mov    %rcx,%r15

variable:
   0x0000000000025d74 <+148>:   and    %rcx,%rdi
   0x0000000000025d77 <+151>:   mov    0x0(%rip),%rcx
   0x0000000000025d7e <+158>:   mov    %rdi,%rdx
   0x0000000000025d84 <+164>:   shl    %cl,%rdx
   0x0000000000025d87 <+167>:   mov    %rdx,%rcx
   0x0000000000025d94 <+180>:   or     %rax,%rcx
   0x0000000000025d9f <+191>:   mov    %rcx,%r15

> Or I found there's one CPU bug flag (software) X86_BUG_L1TF, do you think we
> should check whether boot_cpu_has(X86_BUG_L1TF), and don't setup
> shadow_nonpresent_or_rsvd_mask if it doesn't?

No.  Again, the cost is paid regardless, being paranoid is free.

> > > > > +			(shadow_first_rsvd_bit <
> > > > > +				52 - shadow_nonpresent_or_rsvd_mask_len))
> > > > > +		need_l1tf = true;
> > > > > +	else
> > > > > +		need_l1tf = false;
> > > > > +
> > > > >  	low_phys_bits = boot_cpu_data.x86_phys_bits;
> > > > > -	if (boot_cpu_data.x86_phys_bits <
> > > > > -	    52 - shadow_nonpresent_or_rsvd_mask_len) {
> > > > > +	shadow_nonpresent_or_rsvd_mask = 0;
> > > > > +	if (need_l1tf) {
> > > > >  		shadow_nonpresent_or_rsvd_mask =
> > > > >  			rsvd_bits(boot_cpu_data.x86_phys_bits -
> > > > >  				  shadow_nonpresent_or_rsvd_mask_len,
> > > > 
> > > > This is broken, the reserved bits mask is being calculated with the wrong
> > > > number of physical bits.  I think fixing this would eliminate the need for
> > > > the high_gpa_offset shenanigans.
> > > 
> > > You are right. should use 'shadow_phys_bits' instead. Thanks. Let me think whether
> > > high_gpa_offset
> > > can be avoided.
> > > 
> > > > 
> > > > > @@ -4326,7 +4422,7 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu,
> > > > >  static void
> > > > >  __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
> > > > >  			struct rsvd_bits_validate *rsvd_check,
> > > > > -			int maxphyaddr, int level, bool nx, bool gbpages,
> > > > > +			int first_rsvd_bit, int level, bool nx, bool gbpages,
> > > > >  			bool pse, bool amd)
> > > > 
> > > > Similar to the earlier comment regarding 'first', it's probably less
> > > > confusing overall to just leave this as 'maxphyaddr'.
> > > 
> > > Both work for me. But maybe 'non_rsvd_maxphyaddr' is better? 
> > 
> > Maybe?  My personal preference would be to stay with maxphyaddr.
> 
> Fine to maxphyaddr. Btw, which one do you think is better, shadow_phys_bits, or more explicitly,
> shadow_non_rsvd_phys_bits?

shadow_phys_bits.  In both cases, I prefer avoiding 'non_rsvd' because it's
redundant in the common (non-MKTME) case and seems more likely to confuse
than clarify.  Again, just personal preference...



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux