Re: [PATCH v5 10/14] nEPT: Add nEPT violation/misconfigration support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 01, 2013 at 01:19:11PM +0200, Paolo Bonzini wrote:
>  On Aug 01 2013, Gleb Natapov wrote:
> > On Thu, Aug 01, 2013 at 04:31:31PM +0800, Xiao Guangrong wrote:
> > > On 07/31/2013 10:48 PM, Gleb Natapov wrote:
> > > > From: Yang Zhang <yang.z.zhang@xxxxxxxxx>
> > > > 
> > > > Inject nEPT fault to L1 guest. This patch is original from Xinhao.
> > > > 
> > > > Signed-off-by: Jun Nakajima <jun.nakajima@xxxxxxxxx>
> > > > Signed-off-by: Xinhao Xu <xinhao.xu@xxxxxxxxx>
> > > > Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
> > > > Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx>
> > > > ---
> > > >  arch/x86/include/asm/kvm_host.h |    4 ++++
> > > >  arch/x86/kvm/mmu.c              |   34 ++++++++++++++++++++++++++++++++++
> > > >  arch/x86/kvm/paging_tmpl.h      |   28 ++++++++++++++++++++++++----
> > > >  arch/x86/kvm/vmx.c              |   17 +++++++++++++++++
> > > >  4 files changed, 79 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 531f47c..58a17c0 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -286,6 +286,7 @@ struct kvm_mmu {
> > > >  	u64 *pae_root;
> > > >  	u64 *lm_root;
> > > >  	u64 rsvd_bits_mask[2][4];
> > > > +	u64 bad_mt_xwr;
> > > > 
> > > >  	/*
> > > >  	 * Bitmap: bit set = last pte in walk
> > > > @@ -512,6 +513,9 @@ struct kvm_vcpu_arch {
> > > >  	 * instruction.
> > > >  	 */
> > > >  	bool write_fault_to_shadow_pgtable;
> > > > +
> > > > +	/* set at EPT violation at this point */
> > > > +	unsigned long exit_qualification;
> > > >  };
> > > > 
> > > >  struct kvm_lpage_info {
> > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > > > index 3df3ac3..58ae9db 100644
> > > > --- a/arch/x86/kvm/mmu.c
> > > > +++ b/arch/x86/kvm/mmu.c
> > > > @@ -3521,6 +3521,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
> > > >  	int maxphyaddr = cpuid_maxphyaddr(vcpu);
> > > >  	u64 exb_bit_rsvd = 0;
> > > > 
> > > > +	context->bad_mt_xwr = 0;
> > > > +
> > > >  	if (!context->nx)
> > > >  		exb_bit_rsvd = rsvd_bits(63, 63);
> > > >  	switch (context->root_level) {
> > > > @@ -3576,6 +3578,38 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
> > > >  	}
> > > >  }
> > > > 
> > > > +static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
> > > > +		struct kvm_mmu *context, bool execonly)
> > > > +{
> > > > +	int maxphyaddr = cpuid_maxphyaddr(vcpu);
> > > > +	int pte;
> > > > +
> > > > +	context->rsvd_bits_mask[0][3] =
> > > > +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
> > > > +	context->rsvd_bits_mask[0][2] =
> > > > +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
> > > > +	context->rsvd_bits_mask[0][1] =
> > > > +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6);
> > > > +	context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
> > > > +
> > > > +	/* large page */
> > > > +	context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3];
> > > > +	context->rsvd_bits_mask[1][2] =
> > > > +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29);
> > > > +	context->rsvd_bits_mask[1][1] =
> > > > +		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20);
> > > > +	context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0];
> > > > +	
> > > > +	for (pte = 0; pte < 64; pte++) {
> > > > +		int rwx_bits = pte & 7;
> > > > +		int mt = pte >> 3;
> > > > +		if (mt == 0x2 || mt == 0x3 || mt == 0x7 ||
> > > > +				rwx_bits == 0x2 || rwx_bits == 0x6 ||
> > > > +				(rwx_bits == 0x4 && !execonly))
> > > > +			context->bad_mt_xwr |= (1ull << pte);
> > > > +	}
> > > > +}
> > > > +
> > > >  static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
> > > >  {
> > > >  	unsigned bit, byte, pfec;
> > > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > > > index 0d25351..ed6773e 100644
> > > > --- a/arch/x86/kvm/paging_tmpl.h
> > > > +++ b/arch/x86/kvm/paging_tmpl.h
> > > > @@ -127,12 +127,13 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte)
> > > >  	*access &= mask;
> > > >  }
> > > > 
> > > > -static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level)
> > > > +static bool inline FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte,
> > > > +		int level)
> > > 
> > > Not sure this explicit "inline" is needed... Gcc always inlines the small and
> > > static functions.
> > 
> > Paolo asked for it, but now I see that I did in in a wrong patch. I do
> > not care much personally either way, I agree with you though, compiler
> > will inline it anyway.
> 
> The point here was that if we use "||" below (or multiple "if"s as I
> suggested in my review), we really want to inline the function to ensure
> that the branches here is merged with the one in the caller's "if()".
> 
> With the "|" there is not much effect.
> 
Even with if() do you really think there is a chance the function will not be
inlined? I see that much much bigger functions are inlined.

> > > >  {
> > > > -	int bit7;
> > > > +	int bit7 = (gpte >> 7) & 1, low5 = gpte & 0x3f;
> 
> Low 6, actually.
Well, 5. This is bug, should be 0x1f. Good catch :)

> 
> > > > -	bit7 = (gpte >> 7) & 1;
> > > > -	return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
> > > > +	return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) |
> > > > +		((mmu->bad_mt_xwr & (1ull << low5)) != 0);
> 
> If you really want to optimize this thing to avoid branches, you can
> also change it to
> 
>    return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) |
> 		(mmu->bad_mt_xwr & (1ull << low5))) != 0;
> 
Why not drop second != 0 then?

> and consider adding the bits to bad_mt_xwr to detect non-presence
> at the same time as reserved bits (as non-presence is also tested
> near both callers of is_rsvd_bits_set).
We need to distinguish between two of them at least at one call site.

> 
> On the other hand, it starts to look like useless complication not
> backed by any benchmark (and probably unmeasurable anyway).  Neither of
> the conditions are particularly easy to compute, and they are probably
> more expensive than a well-predicted branch.  Thus, in this case I
> would prefer to have clearer code and just use two "if"s, the second
> of which can be guarded to be done only for EPT (it doesn't have to
> be an #ifdef, no?).

return a | b

is less clear than:

if (a)
 return true;
if (b)
 return true;

?

> 
> If you do not want to do that, now that this is checked also on non-EPT
> PTEs we should rename it to something else like bad_low_six_bits?
> Regular page tables have no MT and XWR bits.
> 
I think the current name better describes the purpose of the field. It
shows that for non ept it is irrelevant, but if we will use it to detect
nonpresent ptes for regular page tables too the rename make perfect
sense. Lest rename it then.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux