On Thu, Jul 25, 2024 at 3:41 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Thu, Jul 25, 2024 at 11:29:49AM -0700, James Houghton wrote: > > > - pages += change_pmd_range(tlb, vma, pud, addr, next, newprot, > > > + > > > + if (pud_leaf(pud)) { > > > + if ((next - addr != PUD_SIZE) || > > > + pgtable_split_needed(vma, cp_flags)) { > > > + __split_huge_pud(vma, pudp, addr); > > > + goto again; > > > > IIUC, most of the time, we're just going to end up clearing the PUD in > > this case. __split_huge_pud() will just clear it, then we goto again > > and `continue` to the next pudp. Is that ok? > > > > (I think it's ok as long as: you never map an anonymous page with a > > PUD, > > I think this is true. > > > and that uffd-wp is not usable with non-hugetlb PUD mappings of > > user memory (which I think is only DAX?). > > Uffd-wp has the async mode that can even work with dax puds.. even though I > don't think anyone should be using it. Just like I'm more sure that nobody > is using mprotect() too with dax pud, and it further justifies why nobody > cared this much.. > > What uffd-wp would do in this case is it'll make pgtable_split_needed() > returns true on this PUD, the PUD got wiped out, goto again, then > change_prepare() will populate this pud with a pgtable page. Then it goes > downwards, install PMD pgtable, then probably start installing pte markers > ultimately if it's a wr-protect operation. Ah, I didn't understand this patch correctly. You're right, you'll install PMDs underneath -- I thought we'd just find pud_none() and bail out. So this all makes sense, thanks! > > > So it seems ok today...?) > > Yes I think it's ok so far, unless you think it's not. :) > > > > > Also, does the comment in pgtable_split_needed() need updating? > > /* > * Return true if we want to split THPs into PTE mappings in change > * protection procedure, false otherwise. > */ > > It looks to me it's ok for now to me? THP can represents PUD in dax, and we > indeed want to break THPs (no matter PUD/PMD) finally into PTE mappings. Oh, heh I was thinking of the other comment: /* * pte markers only resides in pte level, if we need pte markers, * we need to split. We cannot wr-protect shmem thp because file * thp is handled differently when split by erasing the pmd so far. */ I guess this is fine too, it just kind of reads as if this function is only called for PMDs. *shrug* > > > > > Somewhat related question: change_huge_pmd() is very careful not to > > clear the PMD before writing the new value. Yet change_pmd_range(), > > when it calls into __split_huge_pmd(), will totally clear the PMD and > > then populate the PTEs underneath (in some cases at least), seemingly > > reintroducing the MADV_DONTNEED concern. But your PUD version, because > > it never re-populates the PUD (or PMDs/PTEs underneath) does not have > > this issue. WDYT? I guess I'm wrong about this -- your PUD version is the same as the PMD version in this respect: both clear the PUD/PMD and then install lower page table entries. > > Could you elaborate more on the DONTNEED issue you're mentioning here? In change_huge_pmd(), there is a comment about not clearing the pmd before setting the new value. I guess the issue is: if a user is MADV_DONTNEEDing some memory and happens to see a cleared PMD, it will just move on. But in this case, if change_huge_pmd() temporarily cleared a PMD, then MADV_DONTNEED saw it and moved on, and then change_huge_pmd() installed a new PMD, the MADV_DONTNEED has basically skipped over the PMD, probably not what the user wanted. It seems like we have the same issue when a PMD is cleared when we're splitting it. But yeah, given that your PUD version is apparently no different from the PMD version in this respect, maybe this question isn't very interesting. :) > > > > > Thanks for this series! > > Thanks for reviewing it, James.