On Wed, Jan 05, 2022, David Stevens wrote: > On Fri, Dec 31, 2021 at 4:22 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > */ > > > - if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn)))) > > > + if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn))) > > > > Hrm, I know the whole point of this series is to support pages without an elevated > > refcount, but this WARN was extremely helpful in catching several use-after-free > > bugs in the TDP MMU. We talked about burying a slow check behind MMU_WARN_ON, but > > that isn't very helpful because no one runs with MMU_WARN_ON, and this is also a > > type of check that's most useful if it runs in production. > > > > IIUC, this series explicitly disallows using pfns that have a struct page without > > refcounting, and the issue with the WARN here is that kvm_is_zone_device_pfn() is > > called by kvm_is_reserved_pfn() before ensure_pfn_ref() rejects problematic pages, > > i.e. triggers false positive. > > > > So, can't we preserve the use-after-free benefits of the check by moving it to > > where KVM releases the PFN? I.e. > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index fbca2e232e94..675b835525fa 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2904,15 +2904,19 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty); > > > > void kvm_set_pfn_dirty(kvm_pfn_t pfn) > > { > > - if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) > > + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) { > > + WARN_ON_ONCE(!page_count(pfn_to_page(pfn))); > > SetPageDirty(pfn_to_page(pfn)); > > + } > > } > > EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); > > I'm still seeing this warning show up via __handle_changed_spte > calling kvm_set_pfn_dirty: > > [ 113.350473] kvm_set_pfn_dirty+0x26/0x3e > [ 113.354861] __handle_changed_spte+0x452/0x4f6 > [ 113.359841] __handle_changed_spte+0x452/0x4f6 > [ 113.364819] __handle_changed_spte+0x452/0x4f6 > [ 113.369790] zap_gfn_range+0x1de/0x27a > [ 113.373992] kvm_tdp_mmu_zap_invalidated_roots+0x64/0xb8 > [ 113.379945] kvm_mmu_zap_all_fast+0x18c/0x1c1 > [ 113.384827] kvm_page_track_flush_slot+0x55/0x87 > [ 113.390000] kvm_set_memslot+0x137/0x455 > [ 113.394394] kvm_delete_memslot+0x5c/0x91 > [ 113.398888] __kvm_set_memory_region+0x3c0/0x5e6 > [ 113.404061] kvm_set_memory_region+0x45/0x74 > [ 113.408844] kvm_vm_ioctl+0x563/0x60c > > I wasn't seeing it for my particular test case, but the gfn aging code > might trigger the warning as well. Ah, I got royally confused by ensure_pfn_ref()'s comment * Certain IO or PFNMAP mappings can be backed with valid * struct pages, but be allocated without refcounting e.g., * tail pages of non-compound higher order allocations, which * would then underflow the refcount when the caller does the * required put_page. Don't allow those pages here. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ that doesn't apply here because kvm_faultin_pfn() uses the low level __gfn_to_pfn_page_memslot(). and my understanding is that @page will be non-NULL in ensure_pfn_ref() iff the page has an elevated refcount. Can you update the changelogs for the x86+arm64 "use gfn_to_pfn_page" patches to explicitly call out the various ramifications of moving to gfn_to_pfn_page()? Side topic, s/covert/convert in both changelogs :-) > I don't know if setting the dirty/accessed bits in non-refcounted > struct pages is problematic. Without knowing exactly what lies behind such pages, KVM needs to set dirty bits, otherwise there's a potential for data lost. > The only way I can see to avoid it would be to try to map from the spte to > the vma and then check its flags. If setting the flags is benign, then we'd > need to do that lookup to differentiate the safe case from the use-after-free > case. Do you have any advice on how to handle this? Hrm. I can't think of a clever generic solution. But for x86-64, we can use a software available bit to mark SPTEs as being refcounted use that flag to assert the refcount is elevated when marking the backing pfn dirty/accessed. It'd be 64-bit only because we're out of software available bits for PAE paging, but (a) practically no one cares about 32-bit and (b) odds are slim that a use-after-free would be unique to 32-bit KVM. But that can all go in after your series is merged, e.g. I'd prefer to cleanup make_spte()'s prototype to use @fault adding yet another parameter, and that'll take a few patches to make happen since FNAME(sync_page) also uses make_spte(). TL;DR: continue as you were, I'll stop whining about this :-)