On Fri, Feb 05, 2021 at 05:32:57AM -0500, Paolo Bonzini wrote: > This series is the first step towards fixing KVM's usage of follow_pfn. > The immediate fix here is that KVM is not checking the writability of > the PFN, which actually dates back to way before the introduction of > follow_pfn in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults > before giving up", 2016-07-05). There are more changes needed to > invalidate gfn-to-pfn caches from MMU notifiers, but this issue will > be tackled later. > > A more fundamental issue however is that the follow_pfn function is > basically impossible to use correctly. Almost all users for example > are assuming that the page is writable; KVM was not alone in this > mistake. follow_pte, despite not being exported for modules, is a > far saner API. Therefore, patch 1 simplifies follow_pte a bit and > makes it available to modules. > > Please review and possibly ack for inclusion in the KVM tree, > thanks! FWIW, the patches look correct to me (if with patch 2 report fixed): Reviewed-by: Peter Xu <peterx@xxxxxxxxxx> But I do have a question on why dax as the only user needs to pass in the notifier to follow_pte() for initialization. Indeed there're a difference on start/end init of the notifier depending on whether it's a huge pmd but since there's the pmdp passed over too so I assume the caller should know how to init the notifier anyways. The thing is at least in current code we could send meaningless notifiers, e.g., in follow_pte(): if (range) { mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0, NULL, mm, address & PAGE_MASK, (address & PAGE_MASK) + PAGE_SIZE); mmu_notifier_invalidate_range_start(range); } ptep = pte_offset_map_lock(mm, pmd, address, ptlp); if (!pte_present(*ptep)) goto unlock; *ptepp = ptep; return 0; unlock: pte_unmap_unlock(ptep, *ptlp); if (range) mmu_notifier_invalidate_range_end(range); The notify could be meaningless if we do the "goto unlock" path. Ideally it seems we can move the notifier code to caller (as what most mmu notifier users do) and we can also avoid doing that if follow_pte returned -EINVAL. Thanks, -- Peter Xu