Nam! On Wed, Jan 31 2024 at 09:19, Nam Cao wrote: > RISC-V PLIC cannot "end-of-interrupt" (EOI) disabled interrupts, as > explained in the description of Interrupt Completion in the PLIC spec: > > "The PLIC signals it has completed executing an interrupt handler by > writing the interrupt ID it received from the claim to the claim/complete > register. The PLIC does not check whether the completion ID is the same > as the last claim ID for that target. If the completion ID does not match > an interrupt source that *is currently enabled* for the target, the > completion is silently ignored." > > Commit 69ea463021be ("irqchip/sifive-plic: Fixup EOI failed when masked") > ensured that EOI is successful by enabling interrupt first, before EOI. > > Commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask > operations") removed the interrupt enabling code from the previous > commit, because it assumes that interrupt should already be enabled at the > point of EOI. However, this is incorrect: there is a window after a hart > claiming an interrupt and before irq_desc->lock getting acquired, > interrupt can be disabled during this window. Thus, EOI can be invoked > while the interrupt is disabled, effectively nullify this EOI. This > results in the interrupt never gets asserted again, and the device who > uses this interrupt appears frozen. Nice detective work! > Make sure that interrupt is really enabled before EOI. > > Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations") > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Nam Cao <namcao@xxxxxxxxxxxxx> > --- > v2: > - add unlikely() for optimization > - re-word commit message to make it clearer > > drivers/irqchip/irq-sifive-plic.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c > index e1484905b7bd..0a233e9d9607 100644 > --- a/drivers/irqchip/irq-sifive-plic.c > +++ b/drivers/irqchip/irq-sifive-plic.c > @@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d) > { > struct plic_handler *handler = this_cpu_ptr(&plic_handlers); > > - writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM); > + if (unlikely(irqd_irq_disabled(d))) { > + plic_toggle(handler, d->hwirq, 1); > + writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM); > + plic_toggle(handler, d->hwirq, 0); It's unfortunate to have this condition in the hotpath, though it should be cache hot, easy to predict and compared to the writel() completely in the noise. > + } else { > + writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM); > + } > } Can the RISCV folks please have a look at this? Thanks, tglx