On 2019-10-30 at 12:32 Yang Shi wrote: > > >On 10/29/19 8:12 PM, Li Xinhai wrote: >> On 2019-10-30 at 10:50 Yang Shi wrote: >>> >>> On 10/29/19 7:27 PM, Li Xinhai wrote: >>>> One change in do_mbind() of this commit has suspicious usage of return value of >>>> queue_pages_range(), excerpt as below: >>>> >>>> --- >>>> @@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len, >>>> if (err) >>>> goto mpol_out; >>>> >>>> - err = queue_pages_range(mm, start, end, nmask, >>>> + ret = queue_pages_range(mm, start, end, nmask, >>>> flags | MPOL_MF_INVERT, &pagelist); >>>> - if (!err) >>>> - err = mbind_range(mm, start, end, new); >>>> + >>>> + if (ret < 0) { /////// convert to all possible 'ret' to '-EIO' <<<< >>>> + err = -EIO; >>>> + goto up_out; >>>> + } >>>> + >>>> + err = mbind_range(mm, start, end, new); >>>> >>>> if (!err) { >>>> int nr_failed = 0; >>>> --- >>>> >>>> Note that inside queue_pages_range(), the call to walk_page_range() may return >>>> errors from 'test_walk' of 'struct mm_walk_ops', e.g. -EFAULT. Now, those error >>>> codes are no longer reported to user space application. >>>> >>>> From user space, the mbind() call need to reported error, with EFAULT, as example: >>>> EFAULT >>>> Part or all of the memory range specified by nodemask and maxnode points >>>> outside your accessible address space. Or, there was an unmapped hole in the >>>> specified memory range specified by addr and len. >>> Thanks for catching this. That commit was aimed to correct the return >>> values for some corner cases in mbind(), but it should not alter the >>> errno for other failure cases, i.e. -EFAULT. >>> >>> Could you please try the below patch (build test only)? >>> >>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >>> index 4ae967b..99df43a 100644 >>> --- a/mm/mempolicy.c >>> +++ b/mm/mempolicy.c >>> @@ -1286,7 +1286,7 @@ static long do_mbind(unsigned long start, unsigned >>> long len, >>> flags | MPOL_MF_INVERT, &pagelist); >>> >>> if (ret < 0) { >>> - err = -EIO; >>> + err = ret; >>> goto up_out; >>> } >>> >>> >> This seems do not work, because the 'pagelist' would have some pages queued >> into it, need to put back those pages instead of return quickly. >> >> So, we need to remove this page leak as well. <<<<<< >> >> In my understanding, revert the changes as I quoted above may solve it, but not sure >> the details about changes at end of do_mbind(), should keep them at there without >> further change? > >Thanks for pointing this out. We don't have to revert this commit to >handle the non-empty pagelist correctly. The simplest way is to just put >those pages back and I'm supposed this is also the preferred way since >mbind_range() is not called to really apply the policy so those pages >should not be migrated. > >The below patch should solve this: > >diff --git a/mm/mempolicy.c b/mm/mempolicy.c >index 4ae967b..d80025c 100644 >--- a/mm/mempolicy.c >+++ b/mm/mempolicy.c >@@ -1286,7 +1286,10 @@ static long do_mbind(unsigned long start, >unsigned long len, > flags | MPOL_MF_INVERT, &pagelist); > > if (ret < 0) { >- err = -EIO; >+ if (!list_empty(&pagelist)) >+ putback_movable_pages(&pagelist); >+ >+ err = ret; > goto up_out; > } > Checked the original commit about the 'ret >0' purpose, and understood that was for migrating page in best effort way. Your patch looks correct to me. >> >> - Xinhai >> >>>> Please correct me if this is the intended change(and will have updated API >>>> definition), or something was misunderstood. >>>> >>>> -Xinhai >> > > >