On Tue, Aug 13, 2024 at 8:19 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote: > > Hi Jann, > > On 2024/8/13 00:42, Jann Horn wrote: > > The following race can occur: > > > > mfill_atomic other thread > > ============ ============ > > <zap PMD> > > pmdp_get_lockless() [reads none pmd] > > <bail if trans_huge> > > <if none:> > > <pagefault creates transhuge zeropage> > > __pte_alloc [no-op] > > <zap PMD> > > <bail if pmd_trans_huge(*dst_pmd)> > > BUG_ON(pmd_none(*dst_pmd)) > > > > I have experimentally verified this in a kernel with extra mdelay() calls; > > the BUG_ON(pmd_none(*dst_pmd)) triggers. > > > > On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow > > pte_offset_map[_lock]() to fail"), this can't lead to anything worse than > > a BUG_ON(), since the page table access helpers are actually designed to > > deal with page tables concurrently disappearing; but on older kernels > > (<=6.4), I think we could probably theoretically race past the two BUG_ON() > > checks and end up treating a hugepage as a page table. > > > > Cc: stable@xxxxxxxxxxxxxxx > > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") > > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> > > --- > > mm/userfaultfd.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index e54e5c8907fa..ec3750467aa5 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -801,7 +801,8 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > > break; > > } > > /* If an huge pmd materialized from under us fail */ > > - if (unlikely(pmd_trans_huge(*dst_pmd))) { > > + dst_pmdval = pmdp_get_lockless(dst_pmd); > > + if (unlikely(pmd_none(dst_pmdval) || pmd_trans_huge(dst_pmdval))) { > > Before commit 0d940a9b270b, should we also check for > is_pmd_migration_entry(), pmd_devmap() and pmd_bad() here? Oooh. I think you're right that this check is insufficient, thanks for spotting that. I think I should probably change the check to something like this? if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || pmd_devmap(dst_pmdval) || pmd_bad(dst_pmdval))) { !pmd_present() implies !is_pmd_migration_entry(). And the pmd_bad() at the end shouldn't be necessary if everything is working right, I'm just tacking it on to be safe. I'll send a v2 with this change soon. (Alternatively, pmd_leaf() might be useful here, but then we'd have to figure an alternate way of doing this for the backport.)