On Mon, Mar 11, 2019 at 12:56:33PM +0100, Vitaly Kuznetsov wrote: > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes: > > > The cr4_pae flag is a bit of a misnomer, its purpose is really to track > > whether the guest PTE that is being shadowed is a 4-byte entry or an > > 8-byte entry. Prior to supporting nested EPT, the size of the gpte was > > reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, > > but it was mostly harmless since the size of the gpte never mattered. > > Now that a spte may be tracking an indirect EPT entry, relying on > > CR4.PAE is wrong and ill-named. > > > > For direct shadow pages, force the gpte_size to '1' as they are always > > 8-byte entries; EPT entries can only be 8-bytes and KVM always uses > > 8-byte entries for NPT and its identity map (when running with EPT but > > not unrestricted guest). > > > > Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a > > unique scenario as the size of the entries are not dictated by CR4.PAE, > > but neither is the shadow page a direct map. To handle this scenario, > > set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, > > to denote a nested EPT shadow page. Use the information to avoid > > incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). > > > > Providing a consistent and accurate gpte_size fixes a bug reported by > > Vitaly where fast_cr3_switch() always fails when switching from L2 to > > L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, > > whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. > > > > Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") > > Reported-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > > Thank you for working on this! > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > --- > > Documentation/virtual/kvm/mmu.txt | 11 +++++---- > > arch/x86/include/asm/kvm_host.h | 4 ++-- > > arch/x86/kvm/mmu.c | 38 +++++++++++++++++++------------ > > arch/x86/kvm/mmutrace.h | 4 ++-- > > 4 files changed, 35 insertions(+), 22 deletions(-) > > > > diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt > > index f365102c80f5..2747ad774a38 100644 > > --- a/Documentation/virtual/kvm/mmu.txt > > +++ b/Documentation/virtual/kvm/mmu.txt > > @@ -142,7 +142,7 @@ Shadow pages contain the following information: > > If clear, this page corresponds to a guest page table denoted by the gfn > > field. > > role.quadrant: > > - When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit > > + When role.gpte_size=0, the guest uses 32-bit gptes while the host uses 64-bit > > sptes. That means a guest page table contains more ptes than the host, > > so multiple shadow pages are needed to shadow one guest page. > > For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the > > @@ -158,9 +158,9 @@ Shadow pages contain the following information: > > The page is invalid and should not be used. It is a root page that is > > currently pinned (by a cpu hardware register pointing to it); once it is > > unpinned it will be destroyed. > > - role.cr4_pae: > > - Contains the value of cr4.pae for which the page is valid (e.g. whether > > - 32-bit or 64-bit gptes are in use). > > + role.gpte_size: > > + Reflects the size of the guest PTE for which the page is valid, i.e. '1' > > + if 64-bit gptes are in use, '0' if 32-bit gptes are in use. > > role.nxe: > > Contains the value of efer.nxe for which the page is valid. > > role.cr0_wp: > > @@ -173,6 +173,9 @@ Shadow pages contain the following information: > > Contains the value of cr4.smap && !cr0.wp for which the page is valid > > (pages for which this is true are different from other pages; see the > > treatment of cr0.wp=0 below). > > + role.ept_sp: > > + This is a virtual flag to denote a shadowed nested EPT page. ept_sp > > + is true if "cr0_wp && smap_andnot_wp", an otherwise invalid combination. > > role.smm: > > Is 1 if the page is valid in system management mode. This field > > determines which of the kvm_memslots array was used to build this > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 7712b4ed8aa1..fdf67d33f6b3 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -253,14 +253,14 @@ struct kvm_mmu_memory_cache { > > * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used > > * by indirect shadow page can not be more than 15 bits. > > * > > - * Currently, we used 14 bits that are @level, @cr4_pae, @quadrant, @access, > > + * Currently, we used 14 bits that are @level, @gpte_size, @quadrant, @access, > > * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp. > > */ > > union kvm_mmu_page_role { > > u32 word; > > struct { > > unsigned level:4; > > - unsigned cr4_pae:1; > > + unsigned gpte_size:1; > > This is probably just a matter of taste but I would prefer it to be > renamed to something like "gpte_is_64": when I see > > role.gpte_size = true > > I have to go and read the definition if gpte_size, > > role.gpte_is_64 = true > > reads more naturally. Agreed. What about gpte_is_8_bytes to avoid being misconstrued as "this gpte is for paging in 64-bit mode"? > > unsigned quadrant:2; > > unsigned direct:1; > > unsigned access:3;