Re: + watchdog-fix-barriers-when-printing-backtraces-from-all-cpus.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 2021-03-16 12:48:45, Peter Zijlstra wrote:
> On Thu, Mar 11, 2021 at 01:22:07PM -0800, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> > --- a/kernel/watchdog.c~watchdog-fix-barriers-when-printing-backtraces-from-all-cpus
> > +++ a/kernel/watchdog.c
> > @@ -409,12 +409,18 @@ static enum hrtimer_restart watchdog_tim
> >  		if (kvm_check_and_clear_guest_paused())
> >  			return HRTIMER_RESTART;
> >  
> > +		/*
> > +		 * Prevent multiple soft-lockup reports if one cpu is already
> > +		 * engaged in dumping all cpu back traces.
> > +		 */
> >  		if (softlockup_all_cpu_backtrace) {
> > -			/* Prevent multiple soft-lockup reports if one cpu is already
> > -			 * engaged in dumping cpu back traces
> > -			 */
> >  			if (test_and_set_bit(0, &soft_lockup_nmi_warn))
> >  				return HRTIMER_RESTART;
> > +			/*
> > +			 * Make sure that reports are serialized. Start
> > +			 * printing after getting the exclusive rights.
> > +			 */
> > +			smp_mb__after_atomic();
> 
> test_and_set_bit() is a value returning atomic and therefore already
> implies a full smp_mb() on both ends.

Still learning.

> >  		}
> >  
> >  		/* Start period for the next softlockup warning. */
> > @@ -431,14 +437,14 @@ static enum hrtimer_restart watchdog_tim
> >  			dump_stack();
> >  
> >  		if (softlockup_all_cpu_backtrace) {
> > -			/* Avoid generating two back traces for current
> > -			 * given that one is already made above
> > -			 */
> >  			trigger_allbutself_cpu_backtrace();
> > -
> > +			/*
> > +			 * Make sure that everything is printed before
> > +			 * another CPU is allowed to report lockup again.
> > +			 */
> 
> This is not something smp_mb() ensures. smp_mb() is only concerned with
> memory access ordering, not completion, and certainly not some random
> IO.

> IOW, that comment is complete bollocks.

The problem is the comment.

trigger_allbutself_cpu_backtrace() waits until all the NMI handlers
confirm that they printed the backtrace. The generic implemetation
even does clear_bit_unlock(0, &backtrace_flag), see
nmi_trigger_cpumask_backtrace().

> > +			smp_mb__before_atomic();
> > +			/* Allow a further report. */
> >  			clear_bit(0, &soft_lockup_nmi_warn);
> 
> 
> That said, this looks like a test-and-set lock and clear_bit unlock like
> situation. In fact it doesn't seem to use anything other than bit0 of
> that word.

Exactly, they prevent mixing two reports. It consists of more pieces.
The simplified would code looks like:

		if (test_and_set_bit_lock())
			return;

		pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",...)
		print_modules();
		print_irqtrace_events(current);
		if (regs)
			show_regs(regs);
		else
			dump_stack();

		trigger_allbutself_cpu_backtrace();

			clear_bit(0, &soft_lockup_nmi_warn);
			/* Barrier to sync with other cpus */
			smp_mb__after_atomic();
		}

		clear_bit_unlock().

> FWIW we have test_and_set_bit_lock() / clear_bit_unlock().

Thanks for hint. I am going to update the patch.

IMHO, the best solution would be to use test_and_set_bit_lock()
and remove the confusing comment.

Andrew, should I send the entire patchset or just fixup of this patch?

Best Regards,
Petr



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux