Am 23.11.22 um 16:08 schrieb Jason Gunthorpe:
On Wed, Nov 23, 2022 at 03:34:54PM +0100, Daniel Vetter wrote:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedeedb..4161241fc3228c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2598,6 +2598,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
return r;
}
+ /*
+ * Special PTEs are never convertible into a struct page, even if the
+ * driver that owns them might have put a PFN with a struct page into
+ * the PFNMAP. If the arch doesn't support special then we cannot
+ * safely process these pages.
+ */
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+ if (pte_special(*ptep))
+ return -EINVAL;
On second thought this wont work, because it completely defeats the
point of why this code here exists. remap_pfn_range() (which is what
the various dma_mmap functions and the ioremap functions are built on
top of too) sets VM_PFNMAP too, so this check would even catch the
static mappings.
The problem with the way this code is designed is how it allows
returning the pfn without taking any reference based on things like
!pfn_valid or page_reserved. This allows it to then conditionally put
back the reference based on the same reasoning. It is impossible to
thread pte special into that since it is a PTE flag, not a property of
the PFN.
I don't entirely understand why it needs the page reference at all,
That's exactly what I've pointed out in the previous discussion about
that code as well.
As far as I can see it this is just another case where people assumed
that grabbing a page reference somehow magically prevents the pte from
changing.
I have not the slightest idea how people got this impression, but I have
heard it so many time from so many different sources that there must be
some common cause to this. Is the maybe some book or tutorial how to
sophisticate break the kernel or something like this?
Anyway as far as I can see only correct approach would be to use an MMU
notifier or more high level hmm_range_fault()+seq number.
Regards,
Christian.
even if it is available - so I can't guess why it is OK to ignore the
page reference in other cases, or why it is OK to be racy..
Eg hmm_range_fault() does not obtain page references and implements a
very similar algorithm to kvm.
Plus these static mappings aren't all that static either, e.g. pci
access also can revoke bar mappings nowadays.
And there are already mmu notifiers to handle that, AFAIK.
Jason