Thanks again for the review, Peter. On Wed, May 18, 2022 at 11:41 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Wed, May 04, 2022 at 02:44:25PM -0700, Zach O'Keefe wrote: > > +static int find_pmd_or_thp_or_none(struct mm_struct *mm, > > + unsigned long address, > > + pmd_t **pmd) > > +{ > > + pmd_t pmde; > > + > > + *pmd = mm_find_pmd_raw(mm, address); > > + if (!*pmd) > > + return SCAN_PMD_NULL; > > + > > + pmde = pmd_read_atomic(*pmd); > > It seems to be correct on using the atomic fetcher here. Though irrelevant > to this patchset.. does it also mean that we miss that on mm_find_pmd()? I > meant a separate fix like this one: > > ---8<--- > diff --git a/mm/rmap.c b/mm/rmap.c > index 69416072b1a6..61309718640f 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -785,7 +785,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > * without holding anon_vma lock for write. So when looking for a > * genuine pmde (in which to find pte), test present and !THP together. > */ > - pmde = *pmd; > + pmde = pmd_read_atomic(pmd); > barrier(); > if (!pmd_present(pmde) || pmd_trans_huge(pmde)) > pmd = NULL; > ---8<--- > > As otherwise it seems it's also prone to PAE race conditions when reading > pmd out, but I could be missing something. > This is a good question. I took some time to look into this, but it's very complicated and unfortunately I couldn't reach a conclusion in the time I allotted to myself. My working (unverified) assumption is that mm_find_pmd() is called in places where it doesn't care if the pmd isn't read atomically. If so, does that also mean MADV_COLLAPSE is safe? I'm not sure. These i386 PAE + THP racing issues were most recently discussed when considering if READ_ONCE() should be used instead of pmd_read_atomic() [1]. [1] https://lore.kernel.org/linux-mm/594c1f0-d396-5346-1f36-606872cddb18@xxxxxxxxxx/ > > + > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ > > + barrier(); > > +#endif > > + if (!pmd_present(pmde)) > > + return SCAN_PMD_NULL; > > + if (pmd_trans_huge(pmde)) > > + return SCAN_PMD_MAPPED; > > Would it be safer to check pmd_bad()? I think not all mm pmd paths check > that, frankly I don't really know what's the major cause of a bad pmd > (either software bugs or corrupted mem), but just to check with you, > because potentially a bad pmd can be read as SCAN_SUCCEED and go through. > Likewise, I'm not sure what the cause of "bad pmds" is. Do you mean to check pmd_bad() instead of pmd_trans_huge()? I.e. b/c a pmd-mapped thp counts as "bad" (at least on x86 since PSE set) or do you mean to additionally check pmd_bad() after the pmd_trans_huge() check? If it's the former, I'd say we can't claim !pmd_bad() == memory already backed by thps / our job here is done. If it's the latter, I don't see it hurting much (but I can't argue intelligently about why it's needed) and can include the check in v6. Thanks again, Zach > > + return SCAN_SUCCEED; > > +} > > The rest looks good to me, thanks. > > -- > Peter Xu >