On Mon, Sep 25, 2023 at 10:16 AM Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On 9/25/23 8:48 AM, Andrew Morton wrote: > > On Wed, 20 Sep 2023 15:32:42 -0700 Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > >> When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, > >> kernel should attempt to migrate all existing pages, and return -EIO if > >> there is misplaced or unmovable page. Then commit 6f4576e3687b > >> ("mempolicy: apply page table walker on queue_pages_range()") messed up > >> the return value and didn't break VMA scan early ianymore when MPOL_MF_STRICT > >> alone. The return value problem was fixed by commit a7f40cfe3b7a > >> ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified"), > >> but it broke the VMA walk early if unmovable page is met, it may cause some > >> pages are not migrated as expected. > > So I'm thinking that a7f40cfe3b7a is the suitable Fixes: target? > > Yes, thanks. My follow-up email also added this. > > > > >> The code should conceptually do: > >> > >> if (MPOL_MF_MOVE|MOVEALL) > >> scan all vmas > >> try to migrate the existing pages > >> return success > >> else if (MPOL_MF_MOVE* | MPOL_MF_STRICT) > >> scan all vmas > >> try to migrate the existing pages > >> return -EIO if unmovable or migration failed > >> else /* MPOL_MF_STRICT alone */ > >> break early if meets unmovable and don't call mbind_range() at all > >> else /* none of those flags */ > >> check the ranges in test_walk, EFAULT without mbind_range() if discontig. With this change I think my temporary fix at https://lore.kernel.org/all/20230918211608.3580629-1-surenb@xxxxxxxxxx/ can be removed because we either scan all vmas (which means we locked them all) or we break early and do not call mbind_range() at all (in which case we don't need vmas to be locked). > >> > >> Fixed the behavior. > >> >