Re: [PATCH 2/2] KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 11, 2019 at 12:56:33PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
> 
> > The cr4_pae flag is a bit of a misnomer, its purpose is really to track
> > whether the guest PTE that is being shadowed is a 4-byte entry or an
> > 8-byte entry.  Prior to supporting nested EPT, the size of the gpte was
> > reflected purely by CR4.PAE.  KVM fudged things a bit for direct sptes,
> > but it was mostly harmless since the size of the gpte never mattered.
> > Now that a spte may be tracking an indirect EPT entry, relying on
> > CR4.PAE is wrong and ill-named.
> >
> > For direct shadow pages, force the gpte_size to '1' as they are always
> > 8-byte entries; EPT entries can only be 8-bytes and KVM always uses
> > 8-byte entries for NPT and its identity map (when running with EPT but
> > not unrestricted guest).
> >
> > Likewise, nested EPT entries are always 8-bytes.  Nested EPT presents a
> > unique scenario as the size of the entries are not dictated by CR4.PAE,
> > but neither is the shadow page a direct map.  To handle this scenario,
> > set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination,
> > to denote a nested EPT shadow page.  Use the information to avoid
> > incorrectly zapping an unsync'd indirect page in __kvm_sync_page().
> >
> > Providing a consistent and accurate gpte_size fixes a bug reported by
> > Vitaly where fast_cr3_switch() always fails when switching from L2 to
> > L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages,
> > whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE.
> >
> > Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
> > Reported-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> 
> Thank you for working on this!
> 
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > ---
> >  Documentation/virtual/kvm/mmu.txt | 11 +++++----
> >  arch/x86/include/asm/kvm_host.h   |  4 ++--
> >  arch/x86/kvm/mmu.c                | 38 +++++++++++++++++++------------
> >  arch/x86/kvm/mmutrace.h           |  4 ++--
> >  4 files changed, 35 insertions(+), 22 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> > index f365102c80f5..2747ad774a38 100644
> > --- a/Documentation/virtual/kvm/mmu.txt
> > +++ b/Documentation/virtual/kvm/mmu.txt
> > @@ -142,7 +142,7 @@ Shadow pages contain the following information:
> >      If clear, this page corresponds to a guest page table denoted by the gfn
> >      field.
> >    role.quadrant:
> > -    When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit
> > +    When role.gpte_size=0, the guest uses 32-bit gptes while the host uses 64-bit
> >      sptes.  That means a guest page table contains more ptes than the host,
> >      so multiple shadow pages are needed to shadow one guest page.
> >      For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the
> > @@ -158,9 +158,9 @@ Shadow pages contain the following information:
> >      The page is invalid and should not be used.  It is a root page that is
> >      currently pinned (by a cpu hardware register pointing to it); once it is
> >      unpinned it will be destroyed.
> > -  role.cr4_pae:
> > -    Contains the value of cr4.pae for which the page is valid (e.g. whether
> > -    32-bit or 64-bit gptes are in use).
> > +  role.gpte_size:
> > +    Reflects the size of the guest PTE for which the page is valid, i.e. '1'
> > +    if 64-bit gptes are in use, '0' if 32-bit gptes are in use.
> >    role.nxe:
> >      Contains the value of efer.nxe for which the page is valid.
> >    role.cr0_wp:
> > @@ -173,6 +173,9 @@ Shadow pages contain the following information:
> >      Contains the value of cr4.smap && !cr0.wp for which the page is valid
> >      (pages for which this is true are different from other pages; see the
> >      treatment of cr0.wp=0 below).
> > +  role.ept_sp:
> > +    This is a virtual flag to denote a shadowed nested EPT page.  ept_sp
> > +    is true if "cr0_wp && smap_andnot_wp", an otherwise invalid combination.
> >    role.smm:
> >      Is 1 if the page is valid in system management mode.  This field
> >      determines which of the kvm_memslots array was used to build this
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 7712b4ed8aa1..fdf67d33f6b3 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -253,14 +253,14 @@ struct kvm_mmu_memory_cache {
> >   * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
> >   * by indirect shadow page can not be more than 15 bits.
> >   *
> > - * Currently, we used 14 bits that are @level, @cr4_pae, @quadrant, @access,
> > + * Currently, we used 14 bits that are @level, @gpte_size, @quadrant, @access,
> >   * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp.
> >   */
> >  union kvm_mmu_page_role {
> >  	u32 word;
> >  	struct {
> >  		unsigned level:4;
> > -		unsigned cr4_pae:1;
> > +		unsigned gpte_size:1;
> 
> This is probably just a matter of taste but I would prefer it to be
> renamed to something like "gpte_is_64": when I see
> 
> role.gpte_size = true
> 
> I have to go and read the definition if gpte_size,
> 
> role.gpte_is_64 = true
> 
> reads more naturally. 

Agreed.  What about gpte_is_8_bytes to avoid being misconstrued as "this
gpte is for paging in 64-bit mode"?

> >  		unsigned quadrant:2;
> >  		unsigned direct:1;
> >  		unsigned access:3;



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux