Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 03/08/2016 07:44 PM, Paolo Bonzini wrote:
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.

Good catch!

So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
NX is inherited from kernel (kernel always sets NX if CPU supports it),
right?


There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.


Cc: stable@xxxxxxxxxxxxxxx
Cc: Xiao Guangrong <guangrong.xiao@xxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
  Documentation/virtual/kvm/mmu.txt |  3 ++-
  arch/x86/kvm/vmx.c                | 25 +++++++++++++++----------
  2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
  - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
    the kernel may now execute it.  We handle this by also setting spte.nx.
    If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
  - if CR4.SMAP is disabled: since the page has been changed to a kernel
    page, it can not be reused when CR4.SMAP is enabled. We set
    CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
  	guest_efer = vmx->vcpu.arch.efer;

  	/*
-	 * NX is emulated; LMA and LME handled by hardware; SCE meaningless
-	 * outside long mode
+	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
  	 */
-	ignore_bits = EFER_NX | EFER_SCE;
+	ignore_bits = EFER_SCE;
  #ifdef CONFIG_X86_64
  	ignore_bits |= EFER_LMA | EFER_LME;
  	/* SCE is meaningful only in long mode on Intel */
  	if (guest_efer & EFER_LMA)
  		ignore_bits &= ~(u64)EFER_SCE;
  #endif
-	guest_efer &= ~ignore_bits;
-	guest_efer |= host_efer & ignore_bits;
-	vmx->guest_msrs[efer_offset].data = guest_efer;
-	vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+	/* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */

+	if (!enable_ept) {
+		guest_efer |= EFER_NX;
+		ignore_bits |= EFER_NX;

Update ignore_bits is not necessary i think.

+	}

  	clear_atomic_switch_msr(vmx, MSR_EFER);

@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
  	 */
  	if (cpu_has_load_ia32_efer ||
  	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-		guest_efer = vmx->vcpu.arch.efer;
  		if (!(guest_efer & EFER_LMA))
  			guest_efer &= ~EFER_LME;
  		if (guest_efer != host_efer)
  			add_atomic_switch_msr(vmx, MSR_EFER,
  					      guest_efer, host_efer);

So, why not set EFER_NX (if !ept) just in this branch to make the fix more simpler?

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]