On Mon, Nov 25, 2024 at 12:33:52PM +0100, Michal Hocko wrote: > On Sun 24-11-24 03:09:35, Junjie Fu wrote: > > When handling a page fault caused by NUMA balancing (do_numa_page), it is > > necessary to decide whether to migrate the current page to another node or > > keep it on its current node. For pages with the MPOL_PREFERRED memory > > policy, it is sufficient to check whether the first node set in the > > nodemask is the same as the node where the page is currently located. If > > this is the case, the page should remain in its current state. Otherwise, > > migration to another node should be attempted. > > > > Because the definition of MPOL_PREFERRED is as follows: "This mode sets the > > preferred node for allocation. The kernel will try to allocate pages from > > this node first and fall back to nearby nodes if the preferred node is low > > on free memory. If the nodemask specifies more than one node ID, the first > > node in the mask will be selected as the preferred node." > > > > Thus, if the node where the current page resides is not the first node in > > the nodemask, it is not the PREFERRED node, and memory migration can be > > attempted. > > > > However, in the original code, the check only verifies whether the current > > node exists in the nodemask (which may or may not be the first node in the > > mask). This could lead to a scenario where, if the current node is not the > > first node in the nodemask, the code incorrectly decides not to attempt > > migration to other nodes. > > > > This behavior is clearly incorrect. If the target node for migration and > > the page's current NUMA node are both within the nodemask but neither is > > the first node, they should be treated with the same priority, and > > migration attempts should proceed. > > The code is clearly confusing but is there any actual problem to be > solved? IIRC although we do keep nodemask for MPOL_PREFERRED > policy we do not allow to set more than a single node to be set there. > Have a look at mpol_new_preferred > concur here - the proposed patch doesn't actually change any behavior (or it shouldn, at least). Is there a migration error being observed that this patch fixes, or is this just an `observational fix`? ~Gregory