On Thu, Mar 21, 2024 at 10:59:53PM +0100, David Hildenbrand wrote: > s390x must disable shared zeropages for processes running VMs, because > the VMs could end up making use of "storage keys" or protected > virtualization, which are incompatible with shared zeropages. > > Yet, with userfaultfd it is possible to insert shared zeropages into > such processes. Let's fallback to simply allocating a fresh zeroed > anonymous folio and insert that instead. > > mm_forbids_zeropage() was introduced in commit 593befa6ab74 ("mm: introduce > mm_forbids_zeropage function"), briefly before userfaultfd went > upstream. > > Note that we don't want to fail the UFFDIO_ZEROPAGE request like we do > for hugetlb, it would be rather unexpected. Further, we also > cannot really indicated "not supported" to user space ahead of time: it > could be that the MM disallows zeropages after userfaultfd was already > registered. > > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Reviewed-by: Peter Xu <peterx@xxxxxxxxxx> Still, a few comments below. > --- > mm/userfaultfd.c | 35 +++++++++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 712160cd41eca..1d1061ccd1dea 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -316,6 +316,38 @@ static int mfill_atomic_pte_copy(pmd_t *dst_pmd, > goto out; > } > > +static int mfill_atomic_pte_zeroed_folio(pmd_t *dst_pmd, > + struct vm_area_struct *dst_vma, unsigned long dst_addr) > +{ > + struct folio *folio; > + int ret; nitpick: we can set -ENOMEM here, then > + > + folio = vma_alloc_zeroed_movable_folio(dst_vma, dst_addr); > + if (!folio) > + return -ENOMEM; return ret; > + > + ret = -ENOMEM; drop. > + if (mem_cgroup_charge(folio, dst_vma->vm_mm, GFP_KERNEL)) > + goto out_put; > + > + /* > + * The memory barrier inside __folio_mark_uptodate makes sure that > + * preceding stores to the page contents become visible before > + * the set_pte_at() write. > + */ This comment doesn't apply. We can drop it. Thanks, > + __folio_mark_uptodate(folio); > + > + ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr, > + &folio->page, true, 0); > + if (ret) > + goto out_put; > + > + return 0; > +out_put: > + folio_put(folio); > + return ret; > +} > + > static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, > struct vm_area_struct *dst_vma, > unsigned long dst_addr) > @@ -324,6 +356,9 @@ static int mfill_atomic_pte_zeropage(pmd_t *dst_pmd, > spinlock_t *ptl; > int ret; > > + if (mm_forbids_zeropage(dst_vma->mm)) > + return mfill_atomic_pte_zeroed_folio(dst_pmd, dst_vma, dst_addr); > + > _dst_pte = pte_mkspecial(pfn_pte(my_zero_pfn(dst_addr), > dst_vma->vm_page_prot)); > ret = -EAGAIN; > -- > 2.43.2 > -- Peter Xu