On 19/08/2020 09:54, Nicholas Piggin wrote: > Excerpts from peterz@xxxxxxxxxxxxx's message of August 19, 2020 1:41 am: >> On Tue, Aug 18, 2020 at 05:22:33PM +1000, Nicholas Piggin wrote: >>> Excerpts from peterz@xxxxxxxxxxxxx's message of August 12, 2020 8:35 pm: >>>> On Wed, Aug 12, 2020 at 06:18:28PM +1000, Nicholas Piggin wrote: >>>>> Excerpts from peterz@xxxxxxxxxxxxx's message of August 7, 2020 9:11 pm: >>>>>> >>>>>> What's wrong with something like this? >>>>>> >>>>>> AFAICT there's no reason to actually try and add IRQ tracing here, it's >>>>>> just a hand full of instructions at the most. >>>>> >>>>> Because we may want to use that in other places as well, so it would >>>>> be nice to have tracing. >>>>> >>>>> Hmm... also, I thought NMI context was free to call local_irq_save/restore >>>>> anyway so the bug would still be there in those cases? >>>> >>>> NMI code has in_nmi() true, in which case the IRQ tracing is disabled >>>> (except for x86 which has CONFIG_TRACE_IRQFLAGS_NMI). >>>> >>> >>> That doesn't help. It doesn't fix the lockdep irq state going out of >>> synch with the actual irq state. The code which triggered this with the >>> special powerpc irq disable has in_nmi() true as well. >> >> Urgh, you're talking about using lockdep_assert_irqs*() from NMI >> context? >> >> If not, I'm afraid I might've lost the plot a little on what exact >> failure case we're talking about. >> > > Hm, I may have been a bit confused actually. Since your Fix > TRACE_IRQFLAGS vs NMIs patch it might now work. > > I'm worried powerpc disables trace irqs trace_hardirqs_off() > before nmi_enter() might still be a problem, but not sure > actually. Alexey did you end up re-testing with Peter's patch The one above in the thread which replaces powerpc_local_irq_pmu_save() with raw_powerpc_local_irq_pmu_save()? It did not compile as there is no raw_powerpc_local_irq_pmu_save() so I may be missing something here. I applied the patch on top of the current upstream and replaced raw_powerpc_local_irq_pmu_save() with raw_local_irq_pmu_save() (which I think was the intention) but I still see the issue. > or current upstream? The upstream 18445bf405cb (13 hours old) also shows the problem. Yours 1/2 still fixes it. > > Thanks, > Nick > -- Alexey