On Fri, 5 Jul 2024 01:48:21 -0700 Saurabh Sengar <ssengar@xxxxxxxxxxxxxxxxxxx> wrote: > refresh_zone_stat_thresholds function has two loops which is expensive for > higher number of CPUs and NUMA nodes. > > Below is the rough estimation of total iterations done by these loops > based on number of NUMA and CPUs. > > Total number of iterations: nCPU * 2 * Numa * mCPU > Where: > nCPU = total number of CPUs > Numa = total number of NUMA nodes > mCPU = mean value of total CPUs (e.g., 512 for 1024 total CPUs) > > For the system under test with 16 NUMA nodes and 1024 CPUs, this > results in a substantial increase in the number of loop iterations > during boot-up when NUMA is enabled: > > No NUMA = 1024*2*1*512 = 1,048,576 : Here refresh_zone_stat_thresholds > takes around 224 ms total for all the CPUs in the system under test. > 16 NUMA = 1024*2*16*512 = 16,777,216 : Here refresh_zone_stat_thresholds > takes around 4.5 seconds total for all the CPUs in the system under test. Did you measure the overall before-and-after times? IOW, how much of that 4.5s do we reclaim? > Calling this for each CPU is expensive when there are large number > of CPUs along with multiple NUMAs. Fix this by deferring > refresh_zone_stat_thresholds to be called later at once when all the > secondary CPUs are up. Also, register the DYN hooks to keep the > existing hotplug functionality intact. > Seems risky - we'll now have online CPUs which have unintialized data, yes? What assurance do we have that this data won't be accessed? Another approach might be to make the code a bit smarter - instead of calculating thresholds for the whole world, we make incremental changes to the existing thresholds on behalf of the new resource which just became available?