On Fri, 15 Sep 2023, Matthew Wilcox wrote: > On Thu, Sep 14, 2023 at 09:26:15PM -0700, Hugh Dickins wrote: > > On Thu, 14 Sep 2023, Suren Baghdasaryan wrote: > > > Yes, I just finished running the reproducer on both upstream and > > > linux-next builds listed in > > > https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023 and the > > > problem does not happen anymore. > > > I'm fine with your suggestion too, just wanted to point out it would > > > introduce change in the behavior. Let me know how you want to proceed. > > > > Well done, identifying the mysterious cause of this problem: > > I'm glad to hear that you've now verified that hypothesis. > > > > You're right, it would be a regression to follow Matthew's suggestion. > > > > Traditionally, modulo bugs and inconsistencies, the queue_pages_range() > > phase of do_mbind() has done the best it can, gathering all the pages it > > can that need migration, even if some were missed; and proceeds to do the > > mbind_range() phase if there was nothing "seriously" wrong (a gap causing > > -EFAULT). Then at the end, if MPOL_MF_STRICT was set, and not all the > > pages could be migrated (or MOVE was not specified and not all pages > > were well placed), it returns -EIO rather than 0 to inform the caller > > that not all could be done. > > > > There have been numerous tweaks, but I think most importantly > > 5.3's d883544515aa ("mm: mempolicy: make the behavior consistent when > > MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") added those "return 1"s > > which stop the pagewalk early. In my opinion, not an improvement - makes > > it harder to get mbind() to do the best job it can (or is it justified as > > what you're asking for if you say STRICT?). > > I suspect you agree that it's inconsistent to stop early. Userspace > doesn't know at which point we found an unmovable page, so it can't behave > rationally. Perhaps we should remove the 'early stop' and attempt to > migrate every page in the range, whether it's before or after the first > unmovable page? Yes, that's what I was arguing for, and how it was done in olden days. Though (after Yang Shi's following comments, and looking back at my last attempted patch here) I may disagree with myself about the right behavior in the MPOL_MF_STRICT case. Hugh