On Mon, 10 Jul 2023, Zi Yan wrote: > On 8 Jun 2023, at 21:11, Hugh Dickins wrote: > > > filemap_map_pages() allow pte_offset_map_lock() to fail; and remove the > > pmd_devmap_trans_unstable() check from filemap_map_pmd(), which can safely > > return to filemap_map_pages() and let pte_offset_map_lock() discover that. > > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > --- > > mm/filemap.c | 12 +++++------- > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 28b42ee848a4..9e129ad43e0d 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -3408,13 +3408,6 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio, > > if (pmd_none(*vmf->pmd)) > > pmd_install(mm, vmf->pmd, &vmf->prealloc_pte); > > > > - /* See comment in handle_pte_fault() */ > > - if (pmd_devmap_trans_unstable(vmf->pmd)) { > > - folio_unlock(folio); > > - folio_put(folio); > > - return true; > > - } > > - > > There is a pmd_trans_huge() check at the beginning, should it be removed > as well? Since pte_offset_map_lock() is also able to detect it. It probably could be removed: but mostly I avoided such cleanups, in the hope that the patches could be more easily reviewed as safe. But I was eager to delete that obscure pmd_devmap_trans_unstable(). The whole strategy of dealing with the pmd_trans_huge()-like cases first, and only finally arriving at the pte_offset_map_lock() when other cases have been excluded, could be reversed in *many* places. It had to be that way before, because pte_offset_map_lock() could only cope with a page table; but now we could reverse them to do the pte_offset_map_lock() first, and only try the other cases when it fails. That would in theory be more efficient; but whether measurably more efficient I doubt. And very easy to introduce errors on the way: my enthusiasm for such cleanups is low! But maybe there's a few places where the rearrangement would be worthwhile. > > > return false; > > } > > > > @@ -3501,6 +3494,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > > > > addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT); > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); > > + if (!vmf->pte) { > > + folio_unlock(folio); > > + folio_put(folio); > > + goto out; > > + } > > do { > > again: > > page = folio_file_page(folio, xas.xa_index); > > -- > > 2.35.3 > > These two changes affect the ret value. Before, pmd_devmap_trans_unstable() == true > made ret = VM_FAULT_NPAGE, but now ret is the default 0 value. So ret should be set > to VM_FAULT_NPAGE before goto out in the second hunk? Qi Zheng raised a similar question on the original posting, I answered https://lore.kernel.org/linux-mm/fb9a9d57-dbd7-6a6e-d1cb-8dcd64c829a6@xxxxxxxxxx/ It's a rare case to fault here, then find pmd_devmap(*pmd), and it really doesn't matter whether we return VM_FAULT_NOPAGE or 0 for it - maybe I've left it inconsistent between THP and devmap, but it doesn't really matter. I haven't checked Matthew's v5 "new page table range API" posted today, but I expect this all looks different here anyway. Thanks a lot for checking these: they are now in 6.5-rc1, so if you find something that needs fixing, all the more important that we do fix it. Hugh