On 20/Mar/2024 Palmer Dabbelt wrote: > On Tue, 13 Feb 2024 02:26:40 PST (-0800), tglx@xxxxxxxxxxxxx wrote: > > Nam! > > > > On Wed, Jan 31 2024 at 09:19, Nam Cao wrote: > >> RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as > >> explained in the description of Interrupt Completion in the PLIC spec: > >> > >> "The PLIC signals it has completed executing an interrupt handler by > >> writing the interrupt ID it received from the claim to the claim/complete > >> register. The PLIC does not check whether the completion ID is the same > >> as the last claim ID for that target. If the completion ID does not match > >> an interrupt source that *is currently enabled* for the target, the > >> completion is silently ignored." > >> > >> Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked") > >> ensured that EOI is successful by enabling interrupt first, before EOI. > >> > >> Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask > >> operations") removed the interrupt enabling code from the previous > >> commit, because it assumes that interrupt should already be enabled at the > >> point of EOI. However, this is incorrect: there is a window after a hart > >> claiming an interrupt and before irq_desc->lock getting acquired, > >> interrupt can be disabled during this window. Thus, EOI can be invoked > >> while the interrupt is disabled, effectively nullify this EOI. This > >> results in the interrupt never gets asserted again, and the device who > >> uses this interrupt appears frozen. > > > > Nice detective work! > > > >> Make sure that interrupt is really enabled before EOI. > >> > >> Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations") > >> Cc: <stable@xxxxxxxxxxxxxxx> > >> Signed-off-by: Nam Cao <namcao@xxxxxxxxxxxxx> > >> --- > >> v2: > >> - add unlikely() for optimization > >> - re-word commit message to make it clearer > >> > >> drivers/irqchip/irq-sifive-plic.c | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c > >> index e1484905b7bd..0a233e9d9607 100644 > >> --- a/drivers/irqchip/irq-sifive-plic.c > >> +++ b/drivers/irqchip/irq-sifive-plic.c > >> @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d) > >> { > >> struct plic_handler *handler = this_cpu_ptr(&plic_handlers); > >> > >> - writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM); > >> + if (unlikely(irqd_irq_disabled(d))) { > >> + plic_toggle(handler, d->hwirq, 1); > >> + writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM); > >> + plic_toggle(handler, d->hwirq, 0); > > > > It's unfortunate to have this condition in the hotpath, though it should > > be cache hot, easy to predict and compared to the writel() completely in > > the noise. > > Ya, I think it's fine. > > I guess we could try and play some tricks. Maybe hide the load latency > with a relaxed writel and some explict fencing, or claim interrupts when ^ you mean complete? > enabling them. Those both seem somewhat race-prone, though, so I'm not > even sure if they're sane. The latter option is what I also have in mind. Just need to make sure the interrupt is masked and we should be safe. Though there is the question of whether it's worth the effort. I may do that one day when I stop being lazy. Best regards, Nam