On Wed, May 24, 2023, Peter Xu wrote: > On Wed, May 24, 2023 at 09:46:13AM -0700, Sean Christopherson wrote: > > If we hack kvm_pfn_to_refcounted_page(), then all of those protections are lost > > because KVM would drop its assertions and also skip dirtying pages, i.e. would > > effectively suppress the latent detection by check_new_page_bad(). > > So it's probably that I totally have no idea what are the attributes for > those special pages so I don't understand enough on why we need to handle > those pages differently from e.g. PFNMAP pages, and also the benefits. > > I think what I can tell is that they're pages that doesn't have > PageCompound bits set on either head or tails, however it's still a > multi-2-order large page. Is there an example on how these pages are used > and allocated? Why would we need those pages, and whether these pages need > to be set dirty/accessed after all? The use case David is interested in is where an AMD GPU driver kmallocs() a chunk of memory, let's it be mmap()'d by userspace, and userspace then maps it into the guest for a virtual (passthrough?) GPU. For all intents and purposes, it's normal memory, just not refcounted. > > static bool kvm_is_ad_tracked_page(struct page *page) > > { > > + /* > > + * Assert that KVM isn't attempting to mark a freed page as Accessed or > > + * Dirty, i.e. that KVM's MMU doesn't have a use-after-free bug. KVM > > + * (typically) doesn't pin pages that are mapped in KVM's MMU, and > > + * instead relies on mmu_notifiers to know when a mapping needs to be > > + * zapped/invalidated. Unmapping from KVM's MMU must happen _before_ > > + * KVM returns from its mmu_notifier, i.e. the page should have an > > + * elevated refcount at this point even though KVM doesn't hold a > > + * reference of its own. > > + */ > > + if (WARN_ON_ONCE(!page_count(page))) > > + return false; > > + > > /* > > * Per page-flags.h, pages tagged PG_reserved "should in general not be > > * touched (e.g. set dirty) except by its owner". > > > > This looks like a good thing to have, indeed. But again it doesn't seem > like anything special to the pages we're discussing here, say, !Compound && > refcount==0 ones. The problem is that if KVM ignores refcount==0 pages, then KVM can't distinguish between the legitimate[*] refcount==0 AMD GPU case and a buggy refcount==0 use-after-free scenario. I don't want to make that sacrifice as the legimiate !refcounted use case is a very specific use case, whereas consuming refcounted memory is ubiquituous (outside of maybe AWS). [*] Consuming !refcounted pages is safe only for flows that are tied into the mmu_notifiers. The current proposal/plan is to add an off-by-default module param that let's userspace opt-in to kmap() use of !refcounted memory, e.g. this case and PFNMAP memory.