Re: [PATCH v2] KVM: x86: Implement PCID/INVPCID for guests with EPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 17, 2012 at 01:22:05AM +0000, Mao, Junjie wrote:
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@xxxxxxxxxx]
> > Sent: Tuesday, May 15, 2012 10:42 AM
> > To: Mao, Junjie
> > Cc: 'kvm@xxxxxxxxxxxxxxx'
> > Subject: Re: [PATCH v2] KVM: x86: Implement PCID/INVPCID for guests with
> > EPT
> > 
> > On Mon, May 14, 2012 at 06:25:40AM +0000, Mao, Junjie wrote:
> > > This patch handles PCID/INVPCID for guests.
> > >
> > > Process-context identifiers (PCIDs) are a facility by which a logical
> > > processor may cache information for multiple linear-address spaces so
> > > that the processor may retain cached information when software
> > > switches to a different linear address space. Refer to section 4.10.1
> > > in IA32 Intel Software Developer's Manual Volume 3A for details.
> > >
> > > For guests with EPT, the PCID feature is enabled and INVPCID behaves
> > > as running natively.
> > > For guests without EPT, the PCID feature is disabled and INVPCID triggers
> > #UD.
> > >
> > > Changes from v1:
> > > 	Move cr0/cr4 writing checks to x86.c
> > > 	Update comments for the reason why PCID is disabled for non-EPT guests
> > > 	Do not support PCID/INVPCID for nested guests at present
> > > 	Clean up useless symbols
> > >
> > > Signed-off-by: Junjie Mao <junjie.mao@xxxxxxxxx>
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -1711,6 +1711,18 @@ static bool vmx_rdtscp_supported(void)
> > >  	return cpu_has_vmx_rdtscp();
> > >  }
> > >
> > > +static bool vmx_pcid_supported(void)
> > > +{
> > > +	/*
> > > +	 * Enable INVPCID for non-ept guests may cause performance regression,
> > > +	 * and without INVPCID, PCID has little benefits. So disable them all
> > > +	 * for non-ept guests.
> > > +	 *
> > > +	 * PCID is not supported for nested guests yet.
> > > +	 */
> > > +	return enable_ept && (boot_cpu_data.x86_capability[4] &
> > > +bit(X86_FEATURE_PCID)) && !cpu_has_hypervisor; }
> > 
> > The comment Avi made was regarding running a nested guest, not running
> > _as_ a nested guest (which is what cpu_has_hypervisor is about).
> > 
> > You can disable INVPCID exec control (which #UDs), if its in Level-2 guest mode
> > (see if_guest_mode()), and restore the Level-1 value when leaving nested
> > mode.
> 
> This "!cpu_has_hypervisor " is brought by my ignorance on nested vmx. Sorry for that.

That makes two of us. No problem.

> > > +
> > >  /*
> > >   * Swap MSR entry in host/guest MSR entry array.
> > >   */
> > > @@ -2425,6 +2437,8 @@ static __init int setup_vmcs_config(struct
> > vmcs_config *vmcs_conf)
> > >  			SECONDARY_EXEC_UNRESTRICTED_GUEST |
> > >  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> > >  			SECONDARY_EXEC_RDTSCP;
> > > +		if (vmx_pcid_supported())
> > > +			opt2 |= SECONDARY_EXEC_ENABLE_INVPCID;
> > 
> > 
> > You should allow ENABLE_INVPCID control to be set unconditionally here, and
> > adjusted in vmx_secondary_exec_control().
> > 
> > (note that "enable_ept" might be cleared after setup_vmcs_config).
> > 
> > >  		if (adjust_vmx_controls(min2, opt2,
> > >  					MSR_IA32_VMX_PROCBASED_CTLS2,
> > >  					&_cpu_based_2nd_exec_control) < 0) @@ -6420,6
> > +6434,20 @@ static
> > > void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > >  			}
> > >  		}
> > >  	}
> > > +
> > > +	if (vmx_pcid_supported()) {
> > > +		best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
> > > +		if (!best || !(best->ecx & bit(X86_FEATURE_PCID))) {
> > > +			/* Hiding INVPCID when PCID is not exposed. */
> > > +			exec_control =
> > vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> > > +			exec_control &= ~SECONDARY_EXEC_ENABLE_INVPCID;
> > > +			vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
> > > +				     exec_control);
> > > +			best = kvm_find_cpuid_entry(vcpu, 0x7, 0);
> > > +			if (best)
> > > +				best->ecx &= ~bit(X86_FEATURE_INVPCID);
> > > +		}
> > > +	}
> > 
> > - If X86_FEATURE_CPUID bit is set by guest, but X86_FEATURE_INVPCID is
> >   cleared, this allows invpcid to execute (which is wrong, it should
> >   #UD).
> 
> Qemu doesn't support adjusting cpuid leaf 7 at present. The PCID/INVPCID features are both controlled by the 'pcid' bit in cpuid leaf 1, i.e.
> 	- On platforms supporting both the features, 'cpu xxx,+pcid' exposing both PCID and INVPCID and 'cpu xxx,-pcid' hiding both.
> 	- On platforms supporting PCID only, INVPCID is always not available to guests and PCID is exposed or hidden according to the command line.
> 	- On platforms supporting neither, both of them are always hidden.
> This is done because PCID brings little (if any) benefits without the INVPCID instruction.
> Thus if X86_FEATURE_INVPCID is cleared at this point, this means the host doesn't support it and setup_vmcs_config has already masked SECONDARY_VM_EXEC_CONTROL according to IA32_VMX_PROCBASED_CTLS2.

QEMU is not the only user of KVM. Just

if (X86_FEATURE_INVPCID OR X86_FEATURE_INVPCID) are unset
    clear SECONDARY_VM_EXEC_CONTROL

Independently of the facts above, this makes the code more robust.

> > - Must enable vm_exec control bit if cpuid reports the feature enabled,
> >   not only disable it.
> 
> The bit is default to be set if the host supports it. The code here is only used when qemu explicitly disables PCID on platforms supporting the feature.

Userspace is free to call ioctl(SET_CPUID) in any order and number of 
times it pleases. Please support both enable and disable (it is easy,
see RDTSCP code in the function).

> > >  }
> > >
> > >  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2
> > > *entry) @@ -7154,6 +7182,7 @@ static struct kvm_x86_ops vmx_x86_ops =
> > {
> > >  	.cpuid_update = vmx_cpuid_update,
> > >
> > >  	.rdtscp_supported = vmx_rdtscp_supported,
> > > +	.pcid_supported = vmx_pcid_supported,
> > >
> > >  	.set_supported_cpuid = vmx_set_supported_cpuid,
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > > c9d99e5..f930597 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -527,6 +527,10 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned
> > long cr0)
> > >  			return 1;
> > >  	}
> > >
> > > +	if ((old_cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PG) &&
> > > +	    kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
> > > +		return 1;
> > > +
> > >  	kvm_x86_ops->set_cr0(vcpu, cr0);
> > 
> > For completeness, kvm_set_cr3 should deal with the CR3 bits. It is used by
> > nested VMX for example.
> 
> I think you're talking about the reserved bit checks. I'll update it in the next version.

Yes, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux