On Fri, Jul 7, 2023 at 12:58 AM Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote: > > On Thu, Jul 06, 2023 at 01:52:08PM +0900, > David Stevens <stevensd@xxxxxxxxxxxx> wrote: > > > On Wed, Jul 5, 2023 at 7:17 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote: > > > > > > On Tue, Jul 04, 2023 at 04:50:50PM +0900, David Stevens wrote: > > > > From: David Stevens <stevensd@xxxxxxxxxxxx> > > > > > > > > Stop passing FOLL_GET to __kvm_follow_pfn. This allows the host to map > > > > memory into the guest that is backed by un-refcounted struct pages - for > > > > example, higher order non-compound pages allocated by the amdgpu driver > > > > via ttm_pool_alloc_page. > > > > > > I guess you mean the tail pages of the higher order non-compound pages? > > > And as to the head page, it is said to be set to one coincidentally[*], > > > and shall not be considered as refcounted. IIUC, refcount of this head > > > page will be increased and decreased soon in hva_to_pfn_remapped(), so > > > this may not be a problem(?). But treating this head page differently, > > > as a refcounted one(e.g., to set the A/D flags), is weired. > > > > > > Or maybe I missed some context, e.g., can the head page be allocted to > > > guest at all? > > > > Yes, this is to allow mapping the tail pages of higher order > > non-compound pages - I should have been more precise in my wording. > > The head pages can already be mapped into the guest. > > > > Treating the head and tail pages would require changing how KVM > > behaves in a situation it supports today (rather than just adding > > support for an unsupported situation). Currently, without this series, > > KVM can map VM_PFNMAP|VM_IO memory backed by refcounted pages into the > > guest. When that happens, KVM sets the A/D flags. I'm not sure whether > > that's actually valid behavior, nor do I know whether anyone actually > > cares about it. But it's what KVM does today, and I would shy away > > from modifying that behavior without good reason. > > > > > > > > > > The bulk of this change is tracking the is_refcounted_page flag so that > > > > non-refcounted pages don't trigger page_count() == 0 warnings. This is > > > > done by storing the flag in an unused bit in the sptes. > > > > > > Also, maybe we should mention this only works on x86-64. > > > > > > > > > > > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx> > > > > --- > > > > arch/x86/kvm/mmu/mmu.c | 44 +++++++++++++++++++++------------ > > > > arch/x86/kvm/mmu/mmu_internal.h | 1 + > > > > arch/x86/kvm/mmu/paging_tmpl.h | 9 ++++--- > > > > arch/x86/kvm/mmu/spte.c | 4 ++- > > > > arch/x86/kvm/mmu/spte.h | 12 ++++++++- > > > > arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++++++++------- > > > > 6 files changed, 62 insertions(+), 30 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index e44ab512c3a1..b1607e314497 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > > > ... > > > > > > > @@ -2937,6 +2943,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, > > > > bool host_writable = !fault || fault->map_writable; > > > > bool prefetch = !fault || fault->prefetch; > > > > bool write_fault = fault && fault->write; > > > > + bool is_refcounted = !fault || fault->is_refcounted_page; > > > > > > Just wonder, what if a non-refcounted page is prefetched? Or is it possible in > > > practice? > > > > Prefetching is still done via gfn_to_page_many_atomic, which sets > > FOLL_GET. That's fixable, but it's not something this series currently > > does. > > So if we prefetch a page, REFCOUNTED bit is cleared unconditionally with this > hunk. kvm_set_page_{dirty, accessed} won't be called as expected for prefetched > spte. If I read the patch correctly, REFCOUNTED bit in SPTE should represent > whether the corresponding page is ref-countable or not, right? > > Because direct_pte_prefetch_many() is for legacy KVM MMU and FNAME(prefetch_pte) > is shadow paging, we need to test it with legacy KVM MMU or shadow paging to hit > the issue, though. > direct_pte_prefetch_many and prefetch_gpte both pass NULL for the fault parameter, so is_refcounted will evaluate to true. So the spte's refcounted bit will get set in that case. -David