Hugh Dickins <hughd@xxxxxxxxxx> writes: > migrate_vma_collect_pmd(): remove the pmd_trans_unstable() handling after > splitting huge zero pmd, and the pmd_none() handling after successfully > splitting huge page: those are now managed inside pte_offset_map_lock(), > and by "goto again" when it fails. > > But the skip after unsuccessful split_huge_page() must stay: it avoids an > endless loop. The skip when pmd_bad()? Remove that: it will be treated > as a hole rather than a skip once cleared by pte_offset_map_lock(), but > with different timing that would be so anyway; and it's arguably best to > leave the pmd_bad() handling centralized there. So for a pmd_bad() the sequence would be: 1. pte_offset_map_lock() would return NULL and clear the PMD. 2. goto again marks the page as a migrating hole, 3. In migrate_vma_insert_page() a new PMD is created by pmd_alloc(). 4. This leads to a new zero page getting mapped for the previously pmd_bad() mapping. I'm not entirely sure what the pmd_bad() case is used for but is that ok? I understand that previously it was all a matter of timing, but I wouldn't rely on the previous code being correct in this regard either. > migrate_vma_insert_page(): remove comment on the old pte_offset_map() > and old locking limitations; remove the pmd_trans_unstable() check and > just proceed to pte_offset_map_lock(), aborting when it fails (page has > now been charged to memcg, but that's so in other cases, and presumably > uncharged later). Correct, the non-migrating page will be freed later via put_page() which will uncharge the page. > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > --- > mm/migrate_device.c | 31 ++++--------------------------- > 1 file changed, 4 insertions(+), 27 deletions(-) > > diff --git a/mm/migrate_device.c b/mm/migrate_device.c > index d30c9de60b0d..a14af6b12b04 100644 > --- a/mm/migrate_device.c > +++ b/mm/migrate_device.c > @@ -83,9 +83,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > if (is_huge_zero_page(page)) { > spin_unlock(ptl); > split_huge_pmd(vma, pmdp, addr); > - if (pmd_trans_unstable(pmdp)) > - return migrate_vma_collect_skip(start, end, > - walk); > } else { > int ret; > > @@ -100,16 +97,12 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > if (ret) > return migrate_vma_collect_skip(start, end, > walk); > - if (pmd_none(*pmdp)) > - return migrate_vma_collect_hole(start, end, -1, > - walk); > } > } > > - if (unlikely(pmd_bad(*pmdp))) > - return migrate_vma_collect_skip(start, end, walk); > - > ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); > + if (!ptep) > + goto again; > arch_enter_lazy_mmu_mode(); > > for (; addr < end; addr += PAGE_SIZE, ptep++) { > @@ -595,27 +588,10 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, > pmdp = pmd_alloc(mm, pudp, addr); > if (!pmdp) > goto abort; > - > if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) > goto abort; > - > - /* > - * Use pte_alloc() instead of pte_alloc_map(). We can't run > - * pte_offset_map() on pmds where a huge pmd might be created > - * from a different thread. > - * > - * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when > - * parallel threads are excluded by other means. > - * > - * Here we only have mmap_read_lock(mm). > - */ > if (pte_alloc(mm, pmdp)) > goto abort; > - > - /* See the comment in pte_alloc_one_map() */ > - if (unlikely(pmd_trans_unstable(pmdp))) > - goto abort; > - > if (unlikely(anon_vma_prepare(vma))) > goto abort; > if (mem_cgroup_charge(page_folio(page), vma->vm_mm, GFP_KERNEL)) > @@ -650,7 +626,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, > } > > ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); > - > + if (!ptep) > + goto abort; > if (check_stable_address_space(mm)) > goto unlock_abort;