Hi, On Tue, Feb 27, 2024 at 11:22 PM Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> wrote: > > When the watchdog determines that the current soft lockup is due > to an interrupt storm based on CPU utilization, reporting the > most frequent interrupts could be good enough for further > troubleshooting. > > Below is an example of interrupt storm. The call tree does not > provide useful information, but we can analyze which interrupt > caused the soft lockup by comparing the counts of interrupts. > > [ 638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0] > [ 638.870825] CPU#9 Utilization every 4s during lockup: > [ 638.871194] #1: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 638.871652] #2: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 638.872107] #3: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 638.872563] #4: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 638.873018] #5: 0% system, 0% softirq, 100% hardirq, 0% idle > [ 638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs: > [ 638.873994] #1: 330945 irq#7 > [ 638.874236] #2: 31 irq#82 > [ 638.874493] #3: 10 irq#10 > [ 638.874744] #4: 2 irq#89 > [ 638.874992] #5: 1 irq#102 > ... > [ 638.875313] Call trace: > [ 638.875315] __do_softirq+0xa8/0x364 > > Signed-off-by: Bitao Hu <yaoma@xxxxxxxxxxxxxxxxx> > Reviewed-by: Liu Song <liusong@xxxxxxxxxxxxxxxxx> > --- > kernel/watchdog.c | 115 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 111 insertions(+), 4 deletions(-) Reviewed-by: Douglas Anderson <dianders@xxxxxxxxxxxx>