On Mon, Jan 13, 2025 at 7:36 PM David Wang <00107082@xxxxxxx> wrote: > > Hi, > > > At 2025-01-14 05:56:23, "Suren Baghdasaryan" <surenb@xxxxxxxxxx> wrote: > >On Mon, Jan 13, 2025 at 12:04 AM David Wang <00107082@xxxxxxx> wrote: > >> > >> Hi, > >> > >> More update, > >> > >> When I boot up my system, no alloc_percpu was accounted in kernel/sched/topology.c > >> > >> 996 14 kernel/sched/topology.c:2275 func:__sdt_alloc 80 > >> 996 14 kernel/sched/topology.c:2266 func:__sdt_alloc 80 > >> 96 6 kernel/sched/topology.c:2259 func:__sdt_alloc 80 > >> 12388 24 kernel/sched/topology.c:2252 func:__sdt_alloc 80 > >> 612 1 kernel/sched/topology.c:1961 func:sched_init_numa 1 > >> > >> And then after suspend/resume, those alloc_percpu shows up. > >> > >> 996 14 kernel/sched/topology.c:2275 func:__sdt_alloc 395 > >> 996 14 kernel/sched/topology.c:2266 func:__sdt_alloc 395 > >> 96 6 kernel/sched/topology.c:2259 func:__sdt_alloc 395 > >> 12388 24 kernel/sched/topology.c:2252 func:__sdt_alloc 395 > >> 0 0 kernel/sched/topology.c:2242 func:__sdt_alloc 70 <--- > >> 0 0 kernel/sched/topology.c:2238 func:__sdt_alloc 70 <--- > >> 0 0 kernel/sched/topology.c:2234 func:__sdt_alloc 70 <--- > >> 0 0 kernel/sched/topology.c:2230 func:__sdt_alloc 70 <--- > >> 612 1 kernel/sched/topology.c:1961 func:sched_init_numa 1 > >> > >> I have my accumulative counter patch and filter out items with 0 accumulative counter, > >> I am almost sure the patch would not cause this accounting issue, but not 100%..... > > > >Have you tested this without your accumulative counter patch? > >IIUC, that patch filters out any allocation which has never been hit. > >So, if suspend/resume path contains allocations which were never hit > >before then those allocations would become suddenly visible, like in > >your case. That's why I'm against filtering allocinfo data in the > >kernel. Please try this without your patch and see if the data becomes > >more consistent. > > I remove all my patch and build a 6.13.0-rc7 kernel, > After boot up, > 64 1 kernel/sched/topology.c:2579 func:alloc_sched_domains > 896 14 kernel/sched/topology.c:2275 func:__sdt_alloc > 896 14 kernel/sched/topology.c:2266 func:__sdt_alloc > 96 6 kernel/sched/topology.c:2259 func:__sdt_alloc > 12288 24 kernel/sched/topology.c:2252 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2242 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2238 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2234 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2230 func:__sdt_alloc > 512 1 kernel/sched/topology.c:1961 func:sched_init_numa > > And after suspend/resume, no change detected: > 64 1 kernel/sched/topology.c:2579 func:alloc_sched_domains > 896 14 kernel/sched/topology.c:2275 func:__sdt_alloc > 896 14 kernel/sched/topology.c:2266 func:__sdt_alloc > 96 6 kernel/sched/topology.c:2259 func:__sdt_alloc > 12288 24 kernel/sched/topology.c:2252 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2242 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2238 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2234 func:__sdt_alloc > 0 0 kernel/sched/topology.c:2230 func:__sdt_alloc > 512 1 kernel/sched/topology.c:1961 func:sched_init_numa > > I also build a image with accumulative counter, but no filter. > > After boot up: > 64 1 kernel/sched/topology.c:2579 func:alloc_sched_domains 2 > 896 14 kernel/sched/topology.c:2275 func:__sdt_alloc 80 > 896 14 kernel/sched/topology.c:2266 func:__sdt_alloc 80 > 96 6 kernel/sched/topology.c:2259 func:__sdt_alloc 80 > 12288 24 kernel/sched/topology.c:2252 func:__sdt_alloc 80 > 0 0 kernel/sched/topology.c:2242 func:__sdt_alloc 0 <---this *0* seems wrong > 0 0 kernel/sched/topology.c:2238 func:__sdt_alloc 0 > 0 0 kernel/sched/topology.c:2234 func:__sdt_alloc 0 > 0 0 kernel/sched/topology.c:2230 func:__sdt_alloc 0 > 512 1 kernel/sched/topology.c:1961 func:sched_init_numa 1 > > And then suspend/resume: > 64 1 kernel/sched/topology.c:2579 func:alloc_sched_domains 17 > 896 14 kernel/sched/topology.c:2275 func:__sdt_alloc 395 > 896 14 kernel/sched/topology.c:2266 func:__sdt_alloc 395 > 96 6 kernel/sched/topology.c:2259 func:__sdt_alloc 395 > 12288 24 kernel/sched/topology.c:2252 func:__sdt_alloc 395 > 0 0 kernel/sched/topology.c:2242 func:__sdt_alloc 70 > 0 0 kernel/sched/topology.c:2238 func:__sdt_alloc 70 > 0 0 kernel/sched/topology.c:2234 func:__sdt_alloc 70 > 0 0 kernel/sched/topology.c:2230 func:__sdt_alloc 70 > 512 1 kernel/sched/topology.c:1961 func:sched_init_numa 1> > Reading the code, those allocation behaviors should be tied together: > if kzalloc_node at line#2252 happened, then alloc_percpu at line#2230 should also happened. Hmm, ok. Looks like early calls to alloc_percpu() are not being registered somehow. Could you please share your cumulative counter patch with me? I'll try to reproduce this locally and see if I can spot the issue. > > kernel/sched/topology.c > 2230 sdd->sd = alloc_percpu(struct sched_domain *); > 2231 if (!sdd->sd) > 2232 return -ENOMEM; > ... > 2246 for_each_cpu(j, cpu_map) { > ... > 2252 sd = kzalloc_node(sizeof(struct sched_domain) + cpumask_size(), > 2253 GFP_KERNEL, cpu_to_node(j)); > ... > 2257 *per_cpu_ptr(sdd->sd, j) = sd; > > > But somehow during bootup, those alloc_percpu in kernel/sched/topology.c:__sdt_alloc were missed in profiling. > (I am not meant to sell the idea of accumulative counter again here, but it dose help sometimes. :). > > >Thanks, > >Suren. > > > > > >> > > Thanks > David