Hi, On Sun, Jul 30, 2023 at 6:24 PM kernel test robot <lkp@xxxxxxxxx> wrote: > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > head: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4 > commit: 77c12fc95980d100fdc49e88a5727c242d0dfedc watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check() > date: 7 weeks ago > config: x86_64-intel-next-customedconfig-intel_next_rpm_defconfig (https://download.01.org/0day-ci/archive/20230731/202307310955.pLZDhpnl-lkp@xxxxxxxxx/config) > compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 > reproduce: (https://download.01.org/0day-ci/archive/20230731/202307310955.pLZDhpnl-lkp@xxxxxxxxx/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-kbuild-all/202307310955.pLZDhpnl-lkp@xxxxxxxxx/ > > All warnings (new ones prefixed by >>): > > kernel/watchdog.c: In function 'watchdog_hardlockup_check': > >> kernel/watchdog.c:162:1: warning: the frame size of 1248 bytes is larger than 1024 bytes [-Wframe-larger-than=] > 162 | } > | ^ > > > vim +162 kernel/watchdog.c > > 81972551df9d16 Douglas Anderson 2023-05-19 116 > 77c12fc95980d1 Douglas Anderson 2023-05-19 117 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > 81972551df9d16 Douglas Anderson 2023-05-19 118 { > 1610611aadc224 Douglas Anderson 2023-05-19 119 /* > 1610611aadc224 Douglas Anderson 2023-05-19 120 * Check for a hardlockup by making sure the CPU's timer > 1610611aadc224 Douglas Anderson 2023-05-19 121 * interrupt is incrementing. The timer interrupt should have > 81972551df9d16 Douglas Anderson 2023-05-19 122 * fired multiple times before we overflow'd. If it hasn't > 81972551df9d16 Douglas Anderson 2023-05-19 123 * then this is a good indication the cpu is stuck > 81972551df9d16 Douglas Anderson 2023-05-19 124 */ > 77c12fc95980d1 Douglas Anderson 2023-05-19 125 if (is_hardlockup(cpu)) { > 1610611aadc224 Douglas Anderson 2023-05-19 126 unsigned int this_cpu = smp_processor_id(); > 77c12fc95980d1 Douglas Anderson 2023-05-19 127 struct cpumask backtrace_mask = *cpu_online_mask; Ah, so I assume the problem is right here where I put a "struct cpumask" on the stack. FWIW, the direct copy above changed to: struct cpumask backtrace_mask; cpumask_copy(&backtrace_mask, cpu_online_mask); ...in commit 7a71d8e650b0 ("watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()"), but that won't change the stack usage... The failing config says: CONFIG_NR_CPUS=8192 That means that this structure is taking 8192 / 8 = 1024 bytes and so just having this structure on the stack at all will cause the yell. Indeed, grepping through the source code shows that putting `struct cpumask` is not common presumably because of this exact issue. OK, so this doesn't look too hard to solve. We only ever do the backtrace on one CPU anyway (see the test and set of `watchdog_hardlockup_all_cpu_dumped`) so I can just make the structure `static` and only mess with it if we're actually dumping the stack. It'll use up 1KB of "static" memory all the time if you've set max CPUs to 8K, but that seems better than trying to do a malloc when we know the system is hard locked. Patch to fix this is posted at: https://lore.kernel.org/r/20230731091754.1.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid -Doug