On Tue, 12 Jun 2018, Ricardo Neri wrote: > + /* There are no CPUs to monitor. */ > + if (!cpumask_weight(&hdata->monitored_mask)) > + return NMI_HANDLED; > + > inspect_for_hardlockups(regs); > > + /* > + * Target a new CPU. Keep trying until we find a monitored CPU. CPUs > + * are addded and removed to this mask at cpu_up() and cpu_down(), > + * respectively. Thus, the interrupt should be able to be moved to > + * the next monitored CPU. > + */ > + spin_lock(&hld_data->lock); Yuck. Taking a spinlock from NMI ... > + for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) { > + if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu))) > + break; ... and then calling into generic interrupt code which will take even more locks is completely broken. Guess what happens when the NMI hits a section where one of those locks is held? Then you need another watchdog to decode the lockup you just ran into. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html