On Fri, 2 Jun 2023, Jann Horn wrote: > On Fri, Jun 2, 2023 at 6:37 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > The most obvious vital thing (in the split ptlock case) is that it > > remains a struct page with a usable ptl spinlock embedded in it. > > > > The question becomes more urgent when/if extending to replacing the > > pagetable pmd by huge pmd in one go, without any mmap_lock: powerpc > > wants to deposit the page table for later use even in the shmem/file > > case (and all arches in the anon case): I did work out the details once > > before, but I'm not sure whether I would still agree with myself; and was > > glad to leave replacement out of this series, to revisit some time later. > > > > > > > > So in particular, in handle_pte_fault() we can reach the "if > > > (unlikely(!pte_same(*vmf->pte, entry)))" with vmf->pte pointing to a > > > detached zeroed page table, but we're okay with that because in that > > > case we know that !pte_none(vmf->orig_pte)&&pte_none(*vmf->pte) , > > > which implies !pte_same(*vmf->pte, entry) , which means we'll bail > > > out? > > > > There is no current (even at end of series) circumstance in which we > > could be pointing to a detached page table there; but yes, I want to > > allow for that, and yes I agree with your analysis. > > Hmm, what am I missing here? I spent quite a while trying to reconstruct what I had been thinking, what meaning of "detached" or "there" I had in mind when I asserted so confidently "There is no current (even at end of series) circumstance in which we could be pointing to a detached page table there". But had to give up and get on with more useful work. Of course you are right, and that is what this series is about. Hugh > > static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > { > pte_t entry; > > if (unlikely(pmd_none(*vmf->pmd))) { > [not executed] > } else { > /* > * A regular pmd is established and it can't morph into a huge > * pmd by anon khugepaged, since that takes mmap_lock in write > * mode; but shmem or file collapse to THP could still morph > * it into a huge pmd: just retry later if so. > */ > vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, > vmf->address, &vmf->ptl); > if (unlikely(!vmf->pte)) > [not executed] > // this reads a present readonly PTE > vmf->orig_pte = ptep_get_lockless(vmf->pte); > vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID; > > if (pte_none(vmf->orig_pte)) { > [not executed] > } > } > > [at this point, a concurrent THP collapse operation detaches the page table] > // vmf->pte now points into a detached page table > > if (!vmf->pte) > [not executed] > > if (!pte_present(vmf->orig_pte)) > [not executed] > > if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) > [not executed] > > spin_lock(vmf->ptl); > entry = vmf->orig_pte; > // vmf->pte still points into a detached page table > if (unlikely(!pte_same(*vmf->pte, entry))) { > update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); > goto unlock; > } > [...] > }