On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 1376a47fedeedb..4161241fc3228c 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, > > return r; > > } > > > > + /* > > + * Special PTEs are never convertible into a struct page, even if the > > + * driver that owns them might have put a PFN with a struct page into > > + * the PFNMAP. If the arch doesn't support special then we cannot > > + * safely process these pages. > > + */ > > +#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL > > + if (pte_special(*ptep)) > > + return -EINVAL; > > On second thought this wont work, because it completely defeats the > point of why this code here exists. remap_pfn_range() (which is what > the various dma_mmap functions and the ioremap functions are built on > top of too) sets VM_PFNMAP too, so this check would even catch the > static mappings. The problem with the way this code is designed is how it allows returning the pfn without taking any reference based on things like !pfn_valid or page_reserved. This allows it to then conditionally put back the reference based on the same reasoning. It is impossible to thread pte special into that since it is a PTE flag, not a property of the PFN. I don't entirely understand why it needs the page reference at all, even if it is available - so I can't guess why it is OK to ignore the page reference in other cases, or why it is OK to be racy.. Eg hmm_range_fault() does not obtain page references and implements a very similar algorithm to kvm. > Plus these static mappings aren't all that static either, e.g. pci > access also can revoke bar mappings nowadays. And there are already mmu notifiers to handle that, AFAIK. Jason