On Wed, Dec 4, 2019 at 3:45 PM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > > On 12/4/19 12:04 PM, Yang Shi wrote: > > On Wed, Dec 4, 2019 at 11:21 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote: > >> > >> On 12/4/19 11:01 AM, Felix Abecassis wrote: > >>> Hello all, > >>> > >> > >> Hi Felix, > >> > >> Thanks for writing up a very clear description of the problem. > >> > >>> On kernel 5.3, when using the move_pages syscall (wrapped by libnuma) and all > >>> pages happen to be on the right node already, this function returns 0 but the > >>> "status" array is not updated. This array potentially contains garbage values > >>> (e.g. from malloc(3)), and I don't see a way to detect this. > >> > >> > >> The way to detect this case would be to zero the array before calling move_pages(). > >> Then, if move_pages() returns 0, and the array remains full of zeroes, you can > >> conclude that move_pages() "succeeded", and that there were no errors for any > >> of the pages. So the pages are where you requested them to end up. > > > > I don't think we can just simply return all zeros here. It looks the > > status should contain error code or the target node id if the page is > > moved to that node successfully. So, if the page is already on the > > requested node, the status should contain the current node id, but the > > current node maybe not 0. > > > > So, IMHO it sounds like a valid bug. > > > > Yes, you're right, a more precise reading of the man page does support that: > if move_pages() returns 0, then the status array *must* contain valid node IDs. > I see. (Felix also mentioned the same thing, in a side note.) > > Looking some more at both the man page and Felix's report (and the kernel > implementation), it seems like there are maybe two bugs here: > > 1) Not setting the status array in some cases, if some pages were not moved for > "non fatal" reasons, and > > 2) Returning success if no pages were moved. The "ERRORS" section of the man > page seems to require that ENOENT be returned in that case. Although, you could > perhaps argue that this statement is only unidirectional. In other words, maybe > ENOENT happens, but it doesn't *have* to happen, if all pages were already on the > target node. > > ERRORS > > ENOENT No pages were found that require moving. All pages are either already > on the target node, not present, had an invalid address or could not > be moved because they were mapped by multiple processes. Thanks for pointing this out. Yes, the man page says so, but the code doesn't do so at all. I dig into the very first commit 742755a1d8ce2b548428f7aacf1758b4bba50080 ("[PATCH] page migration: sys_move_pages(): support moving of individual pages"), however, it didn't return -ENOENT if the pages are already on the target node if I read the code correctly. Added Chris in this thread who is the original author of this API. > > > thanks, > -- > John Hubbard > NVIDIA