On Mon, Jul 10, 2023 at 10:19 AM Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote: > > On Sat, Jul 8, 2023 at 6:08 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Fri, 7 Jul 2023 14:55:33 -0700 Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote: > > > > > Future patches will re-use PTE_MARKER_SWAPIN_ERROR to implement > > > UFFDIO_POISON, so make some various preparations for that: > > > > > > First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be > > > confusing since we're going to re-use it for something not really > > > related to swap. This can be particularly confusing for things like > > > hugetlbfs, which doesn't support swap whatsoever. Also rename some > > > various helper functions. > > > > > > Next, fix pte marker copying for hugetlbfs. Previously, it would WARN on > > > seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap. > > > But, since we're going to re-use it, we want it to go ahead and copy it > > > just like non-hugetlbfs memory does today. Since the code to do this is > > > more complicated now, pull it out into a helper which can be re-used in > > > both places. While we're at it, also make it slightly more explicit in > > > its handling of e.g. uffd wp markers. > > > > > > For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for > > > an error entry, return VM_FAULT_HWPOISON. For most cases this change > > > doesn't matter, e.g. a userspace program would receive a SIGBUS either > > > way. But for UFFDIO_POISON, this change will let KVM guests get an MCE > > > out of the box, instead of giving a SIGBUS to the hypervisor and > > > requiring it to somehow inject an MCE. > > > > > > Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return > > > VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen today > > > because the lack of swap support means we'll never end up with such a > > > PTE anyway, but this behavior will be needed once such entries *can* > > > show up via UFFDIO_POISON. > > > > > > --- a/include/linux/mm_inline.h > > > +++ b/include/linux/mm_inline.h > > > @@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) > > > return atomic_read(&mm->tlb_flush_pending) > 1; > > > } > > > > > > +/* > > > + * Computes the pte marker to copy from the given source entry into dst_vma. > > > + * If no marker should be copied, returns 0. > > > + * The caller should insert a new pte created with make_pte_marker(). > > > + */ > > > +static inline pte_marker copy_pte_marker( > > > + swp_entry_t entry, struct vm_area_struct *dst_vma) > > > +{ > > > + pte_marker srcm = pte_marker_get(entry); > > > + /* Always copy error entries. */ > > > + pte_marker dstm = srcm & PTE_MARKER_POISONED; > > > + > > > + /* Only copy PTE markers if UFFD register matches. */ > > > + if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma)) > > > + dstm |= PTE_MARKER_UFFD_WP; > > > + > > > + return dstm; > > > +} > > > > Breaks the build with CONFIG_MMU=n (arm allnoconfig). pte_marker isn't > > defined. > > > > I'll slap #ifdef CONFIG_MMU around this function, but probably somethng more > > fine-grained could be used, like CONFIG_PTE_MARKER_UFFD_WP. Please > > consider. > > Whoops, sorry about this. This function "ought" to be in > include/linux/swapops.h where it would be inside a #ifdef CONFIG_MMU > anyway, but it can't be because it uses userfaultfd_wp() so there'd be > a circular include. I think just wrapping it in CONFIG_MMU is the > right way. > > But, this has also made me realize we need to not advertise > UFFDIO_POISON as supported unless we have CONFIG_MMU. I don't want > HAVE_ARCH_USERFAULTFD_WP for that, because it's only enabled on > x86_64, whereas I want to support at least arm64 as well. I don't see > a strong reason not to just use CONFIG_MMU for this too; this feature > depends on the API in swapops.h, which uses that ifdef, so I don't see > a lot of value out of creating a new but equivalent config option. Actually, I'm being silly. CONFIG_USERFAULTFD depends on CONFIG_MMU, so we don't need to worry about most of this. Andrew's fix to just wrap the helper in CONFIG_MMU is enough. > > I'll make the needed changes (and also address Peter's comment above) > and send out a v5. > > > > > btw, both copy_pte_marker() and pte_install_uffd_wp_if_needed() look > > far too large to justify inlining. Please review the desirability of > > this. As far as inlining goes, I'm not opposed to un-inlining this, I was mainly copying that pattern from existing helpers in swapops.h. One question is, if it weren't inline, where should it go? There is no mm/swapops.c which I would say is otherwise the proper place for it. I don't see any other good place for the functions to go. The one I'm introducing isn't userfaultfd-specific so userfaultfd.c seems wrong. > > > >