RE: [PATCH 20/31] nVMX: Exiting from L2 to L1

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Tue, 24 May 2011 20:58:29 +0800

> From: Nadav Har'El
> Sent: Tuesday, May 17, 2011 3:54 AM
> 
> This patch implements nested_vmx_vmexit(), called when the nested L2 guest
> exits and we want to run its L1 parent and let it handle this exit.
> 
> Note that this will not necessarily be called on every L2 exit. L0 may decide
> to handle a particular exit on its own, without L1's involvement; In that
> case, L0 will handle the exit, and resume running L2, without running L1 and
> without calling nested_vmx_vmexit(). The logic for deciding whether to handle
> a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(),
> will appear in a separate patch below.
> 
> Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx>
> ---
>  arch/x86/kvm/vmx.c |  257
> +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 257 insertions(+)
> 
> --- .before/arch/x86/kvm/vmx.c	2011-05-16 22:36:49.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c	2011-05-16 22:36:49.000000000 +0300
> @@ -6203,6 +6203,263 @@ static int nested_vmx_run(struct kvm_vcp
>  	return 1;
>  }
> 
> +/*
> + * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
> + * because L2 may have changed some cr0 bits directly (see
> CRO_GUEST_HOST_MASK)
> + * without L0 trapping the change and updating vmcs12.
> + * This function returns the value we should put in vmcs12.guest_cr0. It's not
> + * enough to just return the current (vmcs02) GUEST_CR0 - that may not be
> the
> + * guest cr0 that L1 thought it was giving its L2 guest; It is possible that
> + * L1 wished to allow its guest to set some cr0 bit directly, but we (L0) asked
> + * to trap this change and instead set just the read shadow bit. If this is the
> + * case, we need to copy these read-shadow bits back to vmcs12.guest_cr0,
> where
> + * L1 believes they already are.
> + */
> +static inline unsigned long
> +vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> +{
> +	/*
> +	 * As explained above, we take a bit from GUEST_CR0 if we allowed the
> +	 * guest to modify it untrapped (vcpu->arch.cr0_guest_owned_bits), or
> +	 * if we did trap it - if we did so because L1 asked to trap this bit
> +	 * (vmcs12->cr0_guest_host_mask). Otherwise (bits we trapped but L1
> +	 * didn't expect us to trap) we read from CR0_READ_SHADOW.
> +	 */
> +	unsigned long guest_cr0_bits =
> +		vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask;
> +	return (vmcs_readl(GUEST_CR0) & guest_cr0_bits) |
> +	       (vmcs_readl(CR0_READ_SHADOW) & ~guest_cr0_bits);
> +}

Hi, Nadav,

Not sure whether I get above operation wrong. But it looks not exactly correct to me
in a glimpse. Say a bit set both in L0/L1's cr0_guest_host_mask. In such case that
bit from vmcs12_GUEST_CR0 resides in vmcs02_CR0_READ_SHADOW, however above
operation will make vmcs02_GUEST_CR0 bit returned instead.

Instead of constructing vmcs12_GUEST_CR0 completely from vmcs02_GUEST_CR0,
why not just updating bits which can be altered while keeping the rest bits from
vmcs12_GUEST_CR0? Say something like:

vmcs12->guest_cr0 &= vmcs12->cr0_guest_host_mask; /* keep unchanged bits */
vmcs12->guest_cr0 |= (vmcs_readl(GUEST_CR0) & vcpu->arch.cr0_guest_owned_bits) |
	(vmcs_readl(CR0_READ_SHADOW) & ~( vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask))

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html