On Wed, Feb 26, 2025 at 10:17 AM Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > > > On 2025/2/26 19:48, Brian Geffon wrote: > > When handling faults for anon shmem finish_fault() will attempt to install > > ptes for the entire folio. Unfortunately if it encounters a single > > non-pte_none entry in that range it will bail, even if the pte that > > triggered the fault is still pte_none. When this situation happens the > > fault will be retried endlessly never making forward progress. > > > > This patch fixes this behavior and if it detects that a pte in the range > > is not pte_none it will fall back to setting just the pte for the > > address that triggered the fault. > > Could you describe in detail how this situation occurs? How is the none > pte inserted within the range of the large folio? Because we have checks > in shmem to determine if a large folio is suitable. We're seeing it because of racing shmem_undo_range() calls, basically we have a portion of the folio is zapped and that prevents finish_fault() from ever completing because it has an assumption that the entire range must be pte_none. > > Anyway, if we find the pte_range_none() is false, we can fallback to > per-page fault as the following code shows (untested), which seems more > simple? Yah, I like this approach. I'll send a v2 with you as a Suggested-by. > > diff --git a/mm/memory.c b/mm/memory.c > index a8196ae72e9a..8a2a9fda5410 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5219,7 +5219,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > !(vma->vm_flags & VM_SHARED); > int type, nr_pages; > - unsigned long addr = vmf->address; > + unsigned long addr; > + bool fallback_per_page = false; > + > + > +fallback: > + addr = vmf->address; > > /* Did we COW the page? */ > if (is_cow) > @@ -5258,7 +5263,8 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > * approach also applies to non-anonymous-shmem faults to avoid > * inflating the RSS of the process. > */ > - if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { > + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) > + || unlikely(fallback_per_page)) { > nr_pages = 1; > } else if (nr_pages > 1) { > pgoff_t idx = folio_page_idx(folio, page); > @@ -5294,9 +5300,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > ret = VM_FAULT_NOPAGE; > goto unlock; > } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > - ret = VM_FAULT_NOPAGE; > - goto unlock; > + fallback_per_page = true; > + pte_unmap_unlock(vmf->pte, vmf->ptl); > + goto fallback; > } > > folio_ref_add(folio, nr_pages - 1); > > > > > Cc: stable@xxxxxxxxxxxxxxx > > Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > > Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio") > > Reported-by: Marek Maslanka <mmaslanka@xxxxxxxxxx> > > Signed-off-by: Brian Geffon <bgeffon@xxxxxxxxxx> > > --- > > mm/memory.c | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index b4d3d4893267..32de626ec1da 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5258,9 +5258,22 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > ret = VM_FAULT_NOPAGE; > > goto unlock; > > } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > > - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > > - ret = VM_FAULT_NOPAGE; > > - goto unlock; > > + /* > > + * We encountered a set pte, let's just try to install the > > + * pte for the original fault if that pte is still pte none. > > + */ > > + pgoff_t idx = (vmf->address - addr) / PAGE_SIZE; > > + > > + if (!pte_none(ptep_get_lockless(vmf->pte + idx))) { > > + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > > + ret = VM_FAULT_NOPAGE; > > + goto unlock; > > + } > > + > > + vmf->pte = vmf->pte + idx; > > + page = folio_page(folio, idx); > > + addr = vmf->address; > > + nr_pages = 1; > > } > > > > folio_ref_add(folio, nr_pages - 1);