On Thu, Aug 01, 2013 at 02:03:01PM +0200, Paolo Bonzini wrote: > On Aug 01 2013, Gleb Natapov wrote: > > On Thu, Aug 01, 2013 at 01:19:11PM +0200, Paolo Bonzini wrote: > > > On Aug 01 2013, Gleb Natapov wrote: > > > > On Thu, Aug 01, 2013 at 04:31:31PM +0800, Xiao Guangrong wrote: > > > > > On 07/31/2013 10:48 PM, Gleb Natapov wrote: > > > > > > From: Yang Zhang <yang.z.zhang@xxxxxxxxx> > > > > > > > > > > > > Inject nEPT fault to L1 guest. This patch is original from Xinhao. > > > > > > > > > > > > Signed-off-by: Jun Nakajima <jun.nakajima@xxxxxxxxx> > > > > > > Signed-off-by: Xinhao Xu <xinhao.xu@xxxxxxxxx> > > > > > > Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx> > > > > > > Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx> > > > > > > --- > > > > > > arch/x86/include/asm/kvm_host.h | 4 ++++ > > > > > > arch/x86/kvm/mmu.c | 34 ++++++++++++++++++++++++++++++++++ > > > > > > arch/x86/kvm/paging_tmpl.h | 28 ++++++++++++++++++++++++---- > > > > > > arch/x86/kvm/vmx.c | 17 +++++++++++++++++ > > > > > > 4 files changed, 79 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > > > > index 531f47c..58a17c0 100644 > > > > > > --- a/arch/x86/include/asm/kvm_host.h > > > > > > +++ b/arch/x86/include/asm/kvm_host.h > > > > > > @@ -286,6 +286,7 @@ struct kvm_mmu { > > > > > > u64 *pae_root; > > > > > > u64 *lm_root; > > > > > > u64 rsvd_bits_mask[2][4]; > > > > > > + u64 bad_mt_xwr; > > > > > > > > > > > > /* > > > > > > * Bitmap: bit set = last pte in walk > > > > > > @@ -512,6 +513,9 @@ struct kvm_vcpu_arch { > > > > > > * instruction. > > > > > > */ > > > > > > bool write_fault_to_shadow_pgtable; > > > > > > + > > > > > > + /* set at EPT violation at this point */ > > > > > > + unsigned long exit_qualification; > > > > > > }; > > > > > > > > > > > > struct kvm_lpage_info { > > > > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > > > > > index 3df3ac3..58ae9db 100644 > > > > > > --- a/arch/x86/kvm/mmu.c > > > > > > +++ b/arch/x86/kvm/mmu.c > > > > > > @@ -3521,6 +3521,8 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, > > > > > > int maxphyaddr = cpuid_maxphyaddr(vcpu); > > > > > > u64 exb_bit_rsvd = 0; > > > > > > > > > > > > + context->bad_mt_xwr = 0; > > > > > > + > > > > > > if (!context->nx) > > > > > > exb_bit_rsvd = rsvd_bits(63, 63); > > > > > > switch (context->root_level) { > > > > > > @@ -3576,6 +3578,38 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, > > > > > > } > > > > > > } > > > > > > > > > > > > +static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu, > > > > > > + struct kvm_mmu *context, bool execonly) > > > > > > +{ > > > > > > + int maxphyaddr = cpuid_maxphyaddr(vcpu); > > > > > > + int pte; > > > > > > + > > > > > > + context->rsvd_bits_mask[0][3] = > > > > > > + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7); > > > > > > + context->rsvd_bits_mask[0][2] = > > > > > > + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6); > > > > > > + context->rsvd_bits_mask[0][1] = > > > > > > + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 6); > > > > > > + context->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51); > > > > > > + > > > > > > + /* large page */ > > > > > > + context->rsvd_bits_mask[1][3] = context->rsvd_bits_mask[0][3]; > > > > > > + context->rsvd_bits_mask[1][2] = > > > > > > + rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29); > > > > > > + context->rsvd_bits_mask[1][1] = > > > > > > + rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 20); > > > > > > + context->rsvd_bits_mask[1][0] = context->rsvd_bits_mask[0][0]; > > > > > > + > > > > > > + for (pte = 0; pte < 64; pte++) { > > > > > > + int rwx_bits = pte & 7; > > > > > > + int mt = pte >> 3; > > > > > > + if (mt == 0x2 || mt == 0x3 || mt == 0x7 || > > > > > > + rwx_bits == 0x2 || rwx_bits == 0x6 || > > > > > > + (rwx_bits == 0x4 && !execonly)) > > > > > > + context->bad_mt_xwr |= (1ull << pte); > > > > > > + } > > > > > > +} > > > > > > + > > > > > > static void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) > > > > > > { > > > > > > unsigned bit, byte, pfec; > > > > > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > > > > > > index 0d25351..ed6773e 100644 > > > > > > --- a/arch/x86/kvm/paging_tmpl.h > > > > > > +++ b/arch/x86/kvm/paging_tmpl.h > > > > > > @@ -127,12 +127,13 @@ static inline void FNAME(protect_clean_gpte)(unsigned *access, unsigned gpte) > > > > > > *access &= mask; > > > > > > } > > > > > > > > > > > > -static bool FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, int level) > > > > > > +static bool inline FNAME(is_rsvd_bits_set)(struct kvm_mmu *mmu, u64 gpte, > > > > > > + int level) > > > > > > > > > > Not sure this explicit "inline" is needed... Gcc always inlines the small and > > > > > static functions. > > > > > > > > Paolo asked for it, but now I see that I did in in a wrong patch. I do > > > > not care much personally either way, I agree with you though, compiler > > > > will inline it anyway. > > > > > > The point here was that if we use "||" below (or multiple "if"s as I > > > suggested in my review), we really want to inline the function to ensure > > > that the branches here is merged with the one in the caller's "if()". > > > > > > With the "|" there is not much effect. > > > > Even with if() do you really think there is a chance the function will not be > > inlined? I see that much much bigger functions are inlined. > > I don't know, it depends on the compiler flags, how much the function > is used... Zeroing the chance is not bad. > AFAIK inline is just a hint anyway and compiler is free to ignore it. That is why we have __always_inline, but compiler should know better here, do not see the reason for __always_inline. > > > > > > { > > > > > > - int bit7; > > > > > > + int bit7 = (gpte >> 7) & 1, low5 = gpte & 0x3f; > > > > > > Low 6, actually. > > > > Well, 5. This is bug, should be 0x1f. Good catch :) > > MT is three bits, actually. > Yep, scrap that. Will rename to low6. > > > > > > - bit7 = (gpte >> 7) & 1; > > > > > > - return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0; > > > > > > + return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0) | > > > > > > + ((mmu->bad_mt_xwr & (1ull << low5)) != 0); > > > > > > If you really want to optimize this thing to avoid branches, you can > > > also change it to > > > > > > return ((gpte & mmu->rsvd_bits_mask[bit7][level-1]) | > > > (mmu->bad_mt_xwr & (1ull << low5))) != 0; > > > > Why not drop second != 0 then? > > Because the function is "bool". I dislike the magic "!= 0" > that the compiler adds on conversion to bool. It always seemed > like a recipe for trouble since "int" and "bool" are otherwise > interchangeable... Without that "!= 0", s/bool/int/ would ignore > the upper 32 bits and break. I actually checked that before proposing. printf("%d\n", (bool)0x1000000000) prints one, but of course if bool is typedefed to int it will not work, but it should be not. > > > > and consider adding the bits to bad_mt_xwr to detect non-presence > > > at the same time as reserved bits (as non-presence is also tested > > > near both callers of is_rsvd_bits_set). > > > > We need to distinguish between two of them at least at one call site. > > You can always check for present afterwards, like > > if (is_rsvd_bit_set_or_nonpresent(pte)) { > if (!is_present_pte(pte)) > ... > } I can, but this is unrelated we can consider it afterwords. > > > > On the other hand, it starts to look like useless complication not > > > backed by any benchmark (and probably unmeasurable anyway). Neither of > > > the conditions are particularly easy to compute, and they are probably > > > more expensive than a well-predicted branch. Thus, in this case I > > > would prefer to have clearer code and just use two "if"s, the second > > > of which can be guarded to be done only for EPT (it doesn't have to > > > be an #ifdef, no?). > > > > return a | b > > > > is less clear than: > > > > if (a) > > return true; > > if (b) > > return true; > > If each of a and b is actually a complicated expression having > AND/SHIFT/arrays, then yes. :) > This is not the most complex expression possible :) Just having them on different lines is not enough? > > > If you do not want to do that, now that this is checked also on non-EPT > > > PTEs we should rename it to something else like bad_low_six_bits? > > > Regular page tables have no MT and XWR bits. > > > > I think the current name better describes the purpose of the field. It > > shows that for non ept it is irrelevant, but if we will use it to detect > > nonpresent ptes for regular page tables too the rename make perfect > > sense. Lest rename it then. > > Agreed on both counts (leaving it as is for the current name, renaming > it if used for nonpresent PTEs). > > Paolo -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html