On Tue, Jul 04, 2023 at 04:50:50PM +0900, David Stevens wrote: > From: David Stevens <stevensd@xxxxxxxxxxxx> > > Stop passing FOLL_GET to __kvm_follow_pfn. This allows the host to map > memory into the guest that is backed by un-refcounted struct pages - for > example, higher order non-compound pages allocated by the amdgpu driver > via ttm_pool_alloc_page. I guess you mean the tail pages of the higher order non-compound pages? And as to the head page, it is said to be set to one coincidentally[*], and shall not be considered as refcounted. IIUC, refcount of this head page will be increased and decreased soon in hva_to_pfn_remapped(), so this may not be a problem(?). But treating this head page differently, as a refcounted one(e.g., to set the A/D flags), is weired. Or maybe I missed some context, e.g., can the head page be allocted to guest at all? > > The bulk of this change is tracking the is_refcounted_page flag so that > non-refcounted pages don't trigger page_count() == 0 warnings. This is > done by storing the flag in an unused bit in the sptes. Also, maybe we should mention this only works on x86-64. > > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx> > --- > arch/x86/kvm/mmu/mmu.c | 44 +++++++++++++++++++++------------ > arch/x86/kvm/mmu/mmu_internal.h | 1 + > arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++--- > arch/x86/kvm/mmu/spte.c | 4 ++- > arch/x86/kvm/mmu/spte.h | 12 ++++++++- > arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++++++++------- > 6 files changed, 62 insertions(+), 30 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index e44ab512c3a1..b1607e314497 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c ... > @@ -2937,6 +2943,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, > bool host_writable = !fault || fault->map_writable; > bool prefetch = !fault || fault->prefetch; > bool write_fault = fault && fault->write; > + bool is_refcounted = !fault || fault->is_refcounted_page; Just wonder, what if a non-refcounted page is prefetched? Or is it possible in practice? ... > > @@ -883,7 +884,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > */ > static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i) > { > - bool host_writable; > + bool host_writable, is_refcounted; > gpa_t first_pte_gpa; > u64 *sptep, spte; > struct kvm_memory_slot *slot; > @@ -940,10 +941,12 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int > sptep = &sp->spt[i]; > spte = *sptep; > host_writable = spte & shadow_host_writable_mask; > + // TODO: is this correct? > + is_refcounted = spte & SPTE_MMU_PAGE_REFCOUNTED; > slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); > make_spte(vcpu, sp, slot, pte_access, gfn, > spte_to_pfn(spte), spte, true, false, > - host_writable, &spte); > + host_writable, is_refcounted, &spte); Could we restrict that a non-refcounted page shall not be used as shadow page? [*] https://lore.kernel.org/all/8caf3008-dcf3-985a-631e-e019b277c6f0@xxxxxxx/ B.R. Yu