Re: [PATCH v5 01/13] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP

"Zach O'Keefe" <zokeefe@xxxxxxxxxx> · Thu, 19 May 2022 14:06:25 -0700

Thanks again for the review, Peter.

On Wed, May 18, 2022 at 11:41 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, May 04, 2022 at 02:44:25PM -0700, Zach O'Keefe wrote:
> > +static int find_pmd_or_thp_or_none(struct mm_struct *mm,
> > +                                unsigned long address,
> > +                                pmd_t **pmd)
> > +{
> > +     pmd_t pmde;
> > +
> > +     *pmd = mm_find_pmd_raw(mm, address);
> > +     if (!*pmd)
> > +             return SCAN_PMD_NULL;
> > +
> > +     pmde = pmd_read_atomic(*pmd);
>
> It seems to be correct on using the atomic fetcher here.  Though irrelevant
> to this patchset.. does it also mean that we miss that on mm_find_pmd()?  I
> meant a separate fix like this one:
>
> ---8<---
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 69416072b1a6..61309718640f 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -785,7 +785,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
>          * without holding anon_vma lock for write.  So when looking for a
>          * genuine pmde (in which to find pte), test present and !THP together.
>          */
> -       pmde = *pmd;
> +       pmde = pmd_read_atomic(pmd);
>         barrier();
>         if (!pmd_present(pmde) || pmd_trans_huge(pmde))
>                 pmd = NULL;
> ---8<---
>
> As otherwise it seems it's also prone to PAE race conditions when reading
> pmd out, but I could be missing something.
>

This is a good question. I took some time to look into this, but it's
very complicated and unfortunately I couldn't reach a conclusion in
the time I allotted to myself. My working (unverified) assumption is
that mm_find_pmd() is called in places where it doesn't care if the
pmd isn't read atomically. If so, does that also mean MADV_COLLAPSE is
safe? I'm not sure. These i386 PAE + THP racing issues were most
recently discussed when considering if READ_ONCE() should be used
instead of pmd_read_atomic() [1].

[1] https://lore.kernel.org/linux-mm/594c1f0-d396-5346-1f36-606872cddb18@xxxxxxxxxx/

> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +     /* See comments in pmd_none_or_trans_huge_or_clear_bad() */
> > +     barrier();
> > +#endif
> > +     if (!pmd_present(pmde))
> > +             return SCAN_PMD_NULL;
> > +     if (pmd_trans_huge(pmde))
> > +             return SCAN_PMD_MAPPED;
>
> Would it be safer to check pmd_bad()?  I think not all mm pmd paths check
> that, frankly I don't really know what's the major cause of a bad pmd
> (either software bugs or corrupted mem), but just to check with you,
> because potentially a bad pmd can be read as SCAN_SUCCEED and go through.
>

Likewise, I'm not sure what the cause of "bad pmds" is.

Do you mean to check pmd_bad() instead of pmd_trans_huge()? I.e. b/c a
pmd-mapped thp counts as "bad" (at least on x86 since PSE set) or do
you mean to additionally check pmd_bad() after the pmd_trans_huge()
check?

If it's the former, I'd say we can't claim !pmd_bad() == memory
already backed by thps / our job here is done.

If it's the latter, I don't see it hurting much (but I can't argue
intelligently about why it's needed) and can include the check in v6.

Thanks again,
Zach

> > +     return SCAN_SUCCEED;
> > +}
>
> The rest looks good to me, thanks.
>
> --
> Peter Xu
>