On Tue, Jul 23, 2024 at 2:16 PM Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > > > > On 7/23/24 11:02, Zhongkun He wrote: > > I found a problem in my test machine that the memory of a process is > > repeatedly migrated between two nodes and does not stop. > > > > 1.Test step and the machines. > > ------------ > > VM machine: 4 numa nodes and 10GB per node. > > > > stress --vm 1 --vm-bytes 12g --vm-keep > > > > The info of numa stat: > > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done > > anon N0=98304 N1=0 N2=10250747904 N3=2634334208 > > anon N0=98304 N1=0 N2=10250747904 N3=2634334208 > > anon N0=98304 N1=0 N2=9937256448 N3=2947825664 > > anon N0=98304 N1=0 N2=8863514624 N3=4021567488 > > anon N0=98304 N1=0 N2=7789772800 N3=5095309312 > > anon N0=98304 N1=0 N2=6716030976 N3=6169051136 > > anon N0=98304 N1=0 N2=5642289152 N3=7242792960 > > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > > anon N0=98304 N1=0 N2=4837007360 N3=8048074752 > > anon N0=98304 N1=0 N2=3763265536 N3=9121816576 > > anon N0=98304 N1=0 N2=2689523712 N3=10195558400 > > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > > anon N0=98304 N1=0 N2=3320455168 N3=9564626944 > > anon N0=98304 N1=0 N2=4394196992 N3=8490885120 > > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > > anon N0=98304 N1=0 N2=6174195712 N3=6710886400 > > anon N0=98304 N1=0 N2=7247937536 N3=5637144576 > > anon N0=98304 N1=0 N2=8321679360 N3=4563402752 > > anon N0=98304 N1=0 N2=9395421184 N3=3489660928 > > anon N0=98304 N1=0 N2=10247872512 N3=2637209600 > > anon N0=98304 N1=0 N2=10247872512 N3=2637209600 > > > > 2. Root cause: > > Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded > > PTEs which are on the right node")the PTE of local pages will not be > > changed in change_pte_range() for single-threaded process, so no > > page_faults information will be generated in do_numa_page(). If a > > single-threaded process has memory on another node, it will > > unconditionally migrate all of it's local memory to that node, > > even if the remote node has only one page. > > > > So, let's fix it. The memory of single-threaded process should follow > > the cpu, not the numa faults info in order to avoid memory thrashing. > > > > After a long time of testing, there is no memory thrashing > > from the beginning. > > > > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > > > V1: > > -- Add the test results (numa stats) from Ying's feedback > > > > Signed-off-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> > > Acked-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > > --- > > kernel/sched/fair.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 24dda708b699..d7cbbda568fb 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p) > > numa_group_count_active_nodes(ng); > > spin_unlock_irq(group_lock); > > max_nid = preferred_group_nid(p, max_nid); > > + } else if (atomic_read(&p->mm->mm_users) == 1) { > > + /* > > + * The memory of a single-threaded process should > > + * follow the CPU in order to avoid memory thrashing. > > + */ > > + max_nid = numa_node_id(); > > } > > > > if (max_faults) { > > This in fact makes sense for a single threaded process but just > wondering could there be any other unwanted side effects ? Hi Anshuman, This fix only works on a single threaded process because of the statement 'atomic_read(&p->mm->mm_users) == 1', so I don't think there's any other effects.