On Wed, Apr 5, 2023 at 11:10 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Wed, Apr 05, 2023 at 09:59:15AM -0700, Yang Shi wrote: > > On Wed, Apr 5, 2023 at 8:51 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > > > Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd round > > > done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(), > > > during which all the locks will be released temporarily. It means the > > > pgtable can change during this phase before 2nd round starts. > > > > > > It's logically possible some ptes got wr-protected during this phase, and > > > we can errornously collapse a thp without noticing some ptes are > > > wr-protected by userfault. e1e267c7928f wanted to avoid it but it only did > > > that for the 1st phase, not the 2nd phase. > > > > > > Since __collapse_huge_page_isolate() happens after a round of small page > > > swapins, we don't need to worry on any !present ptes - if it existed > > > khugepaged will already bail out. So we only need to check present ptes > > > with uffd-wp bit set there. > > > > > > This is something I found only but never had a reproducer, I thought it was > > > one caused a bug in Muhammad's recent pagemap new ioctl work, but it turns > > > out it's not the cause of that but an userspace bug. However this seems to > > > still be a real bug even with a very small race window, still worth to have > > > it fixed and copy stable. > > > > Yeah, I agree. But I got confused by userfaultfd_wp(vma) and > > pte_uffd_wp(pte). If a vma is armed with uffd wp, shall we skip the > > whole vma? If so, whether it is better to just check vma? We do > > revalidate vma once reacquiring mmap_lock, so we should be able to > > bail out earlier. > > Checking against VMA is safe too, the difference is current code still > allows thp to be collapsed as long as none of the page is explicitly > protected over the thp range, even if the range is registered with > userfault-wp. That's also what e1e267c7928f does. > > Here we have slightly different handling between anon / file thps (file > thps checks against the vma flags), IMHO mostly because we don't scan > pgtables when making decisions to collapse a shmem thp, so we made it > simple by checking against vma flags. We can make it the same as anon but > it might be an overkill just to scan the entries for uffd-wp purpose. > > For anon we always scans the pgtable anyway so it's easier to make a more > accurate decision. Aha, I see. It does resolve my confusion. Thanks for elaborating. The patch looks good to me. Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> > > Thanks, > > -- > Peter Xu >