Re: bug: move_pages(2) does not udpate "status" if no pages are moved

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 4, 2019 at 3:45 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote:
>
> On 12/4/19 12:04 PM, Yang Shi wrote:
> > On Wed, Dec 4, 2019 at 11:21 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote:
> >>
> >> On 12/4/19 11:01 AM, Felix Abecassis wrote:
> >>> Hello all,
> >>>
> >>
> >> Hi Felix,
> >>
> >> Thanks for writing up a very clear description of the problem.
> >>
> >>> On kernel 5.3, when using the move_pages syscall (wrapped by libnuma) and all
> >>> pages happen to be on the right node already, this function returns 0 but the
> >>> "status" array is not updated. This array potentially contains garbage values
> >>> (e.g. from malloc(3)), and I don't see a way to detect this.
> >>
> >>
> >> The way to detect this case would be to zero the array before calling move_pages().
> >> Then, if move_pages() returns 0, and the array remains full of zeroes, you can
> >> conclude that move_pages() "succeeded", and that there were no errors for any
> >> of the pages. So the pages are where you requested them to end up.
> >
> > I don't think we can just simply return all zeros here. It looks the
> > status should contain error code or the target node id if the page is
> > moved to that node successfully. So, if the page is already on the
> > requested node, the status should contain the current node id, but the
> > current node maybe not 0.
> >
> > So, IMHO it sounds like a valid bug.
> >
>
> Yes, you're right, a more precise reading of the man page does support that:
> if move_pages() returns 0, then the status array *must* contain valid node IDs.
> I see. (Felix also mentioned the same thing, in a side note.)
>
> Looking some more at both the man page and Felix's report (and the kernel
> implementation), it seems like there are maybe two bugs here:
>
> 1) Not setting the status array in some cases, if some pages were not moved for
> "non fatal" reasons, and
>
> 2) Returning success if no pages were moved. The "ERRORS" section of the man
> page seems to require that ENOENT be returned in that case. Although, you could
> perhaps argue that this statement is only unidirectional. In other words, maybe
> ENOENT happens, but it doesn't *have* to happen, if all pages were already on the
> target node.
>
> ERRORS
>
> ENOENT No pages were found that require moving.  All pages are either already
>        on the target node, not present, had an  invalid  address  or could not
>        be moved because they were mapped by multiple processes.

Thanks for pointing this out. Yes, the man page says so, but the code
doesn't do so at all. I dig into the very first commit
742755a1d8ce2b548428f7aacf1758b4bba50080 ("[PATCH] page migration:
sys_move_pages(): support moving of individual pages"), however, it
didn't return -ENOENT if the pages are already on the target node if I
read the code correctly.

Added Chris in this thread who is the original author of this API.


>
>
> thanks,
> --
> John Hubbard
> NVIDIA




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux