On Thu, 11 Jul 2024 at 10:09, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > When I was working on this patchset this year with the syscall, this is > similar somewhat to the initial approach I was taking with setting up a > special mapping. It turned into kind of a mess and I couldn't get it > working. There's a lot of functionality built around anonymous pages > that would need to be duplicated (I think?). Yeah, I was kind of assuming that. You'd need to handle VM_DROPPABLE in the fault path specially, the way we currently split up based on vma_is_anonymous(), eg if (vma_is_anonymous(vmf->vma)) return do_anonymous_page(vmf); else return do_fault(vmf); in do_pte_missing() etc. I don't actually think it would be too hard, but it's a more "conceptual" change, and it's probably not worth it. > Alright, an hour later of fiddling, and it doesn't actually work (yet?) > -- the selftest fails. A diff follows below. May I suggest a slightly different approach: do what we did for "pte_mkwrite()". It needed the vma too, for not too dissimilar reasons: special dirty bit handling for the shadow stack. See bb3aadf7d446 ("x86/mm: Start actually marking _PAGE_SAVED_DIRTY") b497e52ddb2a ("x86/mm: Teach pte_mkwrite() about stack memory") and now we have "pte_mkwrite_novma()" with the old semantics for the legacy cases that didn't get converted - whether it's because the architecture doesn't have the issue, or because it's a kernel pte. And the conversion was actually quite pain-free, because we have #ifndef pte_mkwrite static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_novma(pte); } #endif so all any architecture that didn't want this needed to do was to rename their pte_mkwrite() to pte_mkwrite_novma() and they were done. In fact, that was done first as basically semantically no-op patches: 2f0584f3f4bd ("mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()") 6ecc21bb432d ("mm: Move pte/pmd_mkwrite() callers with no VMA to _novma()") 161e393c0f63 ("mm: Make pte_mkwrite() take a VMA") which made this all very pain-free (and was largely a sed script, I think). > - !pte_dirty(pte) && !PageDirty(page)) > + !pte_dirty(pte) && !PageDirty(page) && > + !(vma->vm_flags & VM_DROPPABLE)) So instead of this kind of thing, we'd have > - !pte_dirty(pte) && !PageDirty(page)) > + !pte_dirty(pte, vma) && !PageDirty(page) && and the advantage here is that you can't miss anybody by mistake. The compiler will be very unhappy if you don't pass in the vma, and then any places that would be converted to "pte_dirty_novma()" We don't actually have all that many users of pte_dirty(), so it doesn't look too nasty. And if we make the pte_dirty() semantics depend on the vma, I really think we should do it the same way we did pte_mkwrite(). Long-term, maybe we should just aim to always pass in the vma to the pte_xyz() functions, but... Linus