On Wed, Mar 3, 2021 at 6:25 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Wed 03-03-21 17:39:56, Muchun Song wrote: > > For simplification 991e7673859e ("mm: memcontrol: account kernel stack > > per node") has changed the per zone vmalloc backed stack pages > > accounting to per node. By doing that we have lost a certain precision > > because those pages might live in different NUMA nodes. In the end > > NR_KERNEL_STACK_KB exported to the userspace might be over estimated on > > some nodes while underestimated on others. > > > > This doesn't impose any real problem to correctnes of the kernel > > behavior as the counter is not used for any internal processing but it > > can cause some confusion to the userspace. > > You have skipped over one part of the changelog I have proposed and that > is to provide an actual data. Because this is a problem I found by looking at the code, not a real world problem. I do not have any actual data. :-( > > > Address the problem by accounting each vmalloc backing page to its own > > node. > > > > Fixes: 991e7673859e ("mm: memcontrol: account kernel stack per node") > > Fixes tag might make somebody assume this is worth backporting but I > highly doubt so. OK. I can remove the Fixes tag. > > > Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > > Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > Anyway > Acked-by: Michal Hocko <mhocko@xxxxxxxx> Thanks for your review. > > as the patch is correct with one comment below > > > --- > > Changelog in v2: > > - Rework commit log suggested by Michal. > > > > Thanks to Michal and Shakeel for review. > > > > kernel/fork.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index d66cd1014211..6e2201feb524 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -379,14 +379,19 @@ static void account_kernel_stack(struct task_struct *tsk, int account) > > void *stack = task_stack_page(tsk); > > struct vm_struct *vm = task_stack_vm_area(tsk); > > > > + if (vm) { > > + int i; > > > > - /* All stack pages are in the same node. */ > > - if (vm) > > - mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB, > > - account * (THREAD_SIZE / 1024)); > > - else > > + BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE); > > I do not think we need this BUG_ON. What kind of purpose does it serve? vm->nr_pages should be always equal to THREAD_SIZE / PAGE_SIZE if the system is not corrupted. It makes sense to remove the BUG_ON. I will remove it in the next version. Thanks. > > > + > > + for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) > > + mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB, > > + account * (PAGE_SIZE / 1024)); > > + } else { > > + /* All stack pages are in the same node. */ > > mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB, > > account * (THREAD_SIZE / 1024)); > > + } > > } > > > > static int memcg_charge_kernel_stack(struct task_struct *tsk) > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs