On 6/20/19 6:08 PM, Yang Shi wrote: > > > On 6/20/19 12:18 AM, Vlastimil Babka wrote: >> On 6/19/19 8:19 PM, Yang Shi wrote: >>>>>> This is getting even more muddy TBH. Is there any reason that we >>>>>> have to >>>>>> handle this problem during the isolation phase rather the migration? >>>>> I think it was already said that if pages can't be isolated, then >>>>> migration phase won't process them, so they're just ignored. >>>> Yes,exactly. >>>> >>>>> However I think the patch is wrong to abort immediately when >>>>> encountering such page that cannot be isolated (AFAICS). IMHO it should >>>>> still try to migrate everything it can, and only then return -EIO. >>>> It is fine too. I don't see mbind semantics define how to handle such >>>> case other than returning -EIO. >> I think it does. There's: >> If MPOL_MF_MOVE is specified in flags, then the kernel *will attempt to >> move all the existing pages* ... If MPOL_MF_STRICT is also specified, >> then the call fails with the error *EIO if some pages could not be moved* >> >> Aborting immediately would be against the attempt to move all. >> >>> By looking into the code, it looks not that easy as what I thought. >>> do_mbind() would check the return value of queue_pages_range(), it just >>> applies the policy and manipulates vmas as long as the return value is 0 >>> (success), then migrate pages on the list. We could put the movable >>> pages on the list by not breaking immediately, but they will be ignored. >>> If we migrate the pages regardless of the return value, it may break the >>> policy since the policy will *not* be applied at all. >> I think we just need to remember if there was at least one page that >> failed isolation or migration, but keep working, and in the end return >> EIO if there was such page(s). I don't think it breaks the policy. Once >> pages are allocated in a mapping, changing the policy is a best effort >> thing anyway. > > The current behavior is: > If queue_pages_range() return -EIO (vma is not migratable, ignore other > conditions since we just focus on page migration), the policy won't be > set and no page will be migrated. Ah, I see. IIUC the current behavior is due to your recent commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") in order to fix commit 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()"), which caused -EIO to be not returned enough. But I think you went too far and instead return -EIO too much. If I look at the code in parent commit of 6f4576e3687b, I can see in queue_pages_range(): if ((flags & MPOL_MF_STRICT) || ((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) && vma_migratable(vma))) { err = queue_pages_pgd_range(vma, start, endvma, nodes, flags, private); if (err) break; } and in queue_pages_pte_range(): if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) migrate_page_add(page, private, flags); else break; So originally, there was no returning of -EIO due to !vma_migratable() - as long as MPOL_MF_STRICT and MPOL_MF_MOVE* was specified, the code tried to queue for migration everything it could and didn't ever abort, AFAICS. And I still think that's the best possible behavior. > However, the problem here is the vma might look migratable, but some or > all the underlying pages are unmovable. So, my patch assumes the vma is > *not* migratable if at least one page is unmovable. I'm not sure if it > is possible to have both movable and unmovable pages for the same > mapping or not, I'm supposed the vma would be split much earlier. > > If we don't abort immediately, then we record if there is unmovable > page, then we could do: > #1. Still follows the current behavior (then why not abort immediately?) See above how the current behavior differs from the original one. > #2. Set mempolicy then migrate all the migratable pages. But, we may end > up with the pages on node A, but the policy says node B. Doesn't it > break the policy? The policy can already be "broken" (violated is probably better word) by migrate_pages() failing. If that happens, we don't rollback the migrated pages and reset the policy back, right? I think the manpage is clear that MPOL_MF_MOVE is a best-effort. Userspace will know that not everything was successfully migrated (via -EIO), and can take whatever steps it deems necessary - attempt rollback, determine which exact page(s) are violating the policy, etc. >> >>>> >