Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Wed, 13 Jun 2018 11:48:09 +0200 (CEST)

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> +	/* There are no CPUs to monitor. */
> +	if (!cpumask_weight(&hdata->monitored_mask))
> +		return NMI_HANDLED;
> +
>  	inspect_for_hardlockups(regs);
>  
> +	/*
> +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> +	 * respectively. Thus, the interrupt should be able to be moved to
> +	 * the next monitored CPU.
> +	 */
> +	spin_lock(&hld_data->lock);

Yuck. Taking a spinlock from NMI ... 

> +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> +			break;

... and then calling into generic interrupt code which will take even more
locks is completely broken.

Guess what happens when the NMI hits a section where one of those locks is
held? Then you need another watchdog to decode the lockup you just ran into.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html