Quoting Douglas Anderson (2020-03-17 13:37:06) > Every once in a while (like once in a few months on a device) people > have seen warnings on devices using spi-geni-qcom like: > > irq ...: nobody cared (try booting with the "irqpoll" option) > > ...where the interrupt number listed matches up with "spi_geni" in > /proc/interrupts. > > You can get "nobody cared" if the interrupt handler returns IRQ_NONE. > Once you get into this state the driver will just stop working. > > Auditing the code makes me believe that it's probably not so good > checking "cur_mcmd" in the interrupt handler not under spinlock. > Let's move it to be under spinlock. Looking more closely at the code, > it looks as if there are some other places that could be under > spinlock, so let's add. It looks as if the original code was assuming > that by the time that the interrupt handler got called that the write > to "cur_mcmd" (to make it not CMD_NONE anymore) would make it to the > processor handling the interrupt. Perhaps with weakly ordered memory > this sometimes (though very rarely) happened. > > Let's also add a warning (with the IRQ status) in the case that we > ever end up getting an interrupt when we think we shouldn't. This > would give us a better chance to debug if this patch doesn't help the > issue. We'll also try our best to clear the interrupt to hopefully > get us out of this state. > > Patch is marked "speculative" because I have no way to reproduce this > problem, so I'm just hoping this fixes it. Weakly ordered memory > makes my brain hurt. It could be that. It could also be the poor design of geni_se_init() and how it enables many interrupts that this driver doesn't look to handle? Why do we allow the wrapper to enable all those bits in M_COMMON_GENI_M_IRQ_EN and S_COMMON_GENI_S_IRQ_EN if nobody is going to handle them? > > Fixes: 561de45f72bd ("spi: spi-geni-qcom: Add SPI driver support for GENI based QUP") > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > --- > > Changes in v2: > - Detect true spurious interrupt. > - Still return IRQ_NONE for state machine mismatch, but print warn. > > drivers/spi/spi-geni-qcom.c | 35 ++++++++++++++++++++++++++++++----- > 1 file changed, 30 insertions(+), 5 deletions(-) > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c > index c3972424af71..92e51ccb427f 100644 > --- a/drivers/spi/spi-geni-qcom.c > +++ b/drivers/spi/spi-geni-qcom.c > @@ -151,16 +151,19 @@ static void spi_geni_set_cs(struct spi_device *slv, bool set_flag) > struct geni_se *se = &mas->se; > unsigned long time_left; > > - reinit_completion(&mas->xfer_done); > pm_runtime_get_sync(mas->dev); > if (!(slv->mode & SPI_CS_HIGH)) > set_flag = !set_flag; > > + spin_lock_irq(&mas->lock); Why is this spin_lock_irq() vs. spin_lock_irqsave()? This isn't possible to be called from somewhere that hasn't changed irq flags? > + reinit_completion(&mas->xfer_done); > + > mas->cur_mcmd = CMD_CS; > if (set_flag) > geni_se_setup_m_cmd(se, SPI_CS_ASSERT, 0); > else > geni_se_setup_m_cmd(se, SPI_CS_DEASSERT, 0); > + spin_unlock_irq(&mas->lock); This will force on interrupts if they were masked. > > time_left = wait_for_completion_timeout(&mas->xfer_done, HZ); > if (!time_left) > @@ -307,6 +310,8 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > u32 spi_tx_cfg, len; > struct geni_se *se = &mas->se; > > + spin_lock_irq(&mas->lock); > + > spi_tx_cfg = readl(se->base + SE_SPI_TRANS_CFG); > if (xfer->bits_per_word != mas->cur_bits_per_word) { > spi_setup_word_len(mas, mode, xfer->bits_per_word); > @@ -376,6 +381,8 @@ static void setup_fifo_xfer(struct spi_transfer *xfer, > */ > if (m_cmd & SPI_TX_ONLY) > writel(mas->tx_wm, se->base + SE_GENI_TX_WATERMARK_REG); > + > + spin_unlock_irq(&mas->lock); > } > > static int spi_geni_transfer_one(struct spi_master *spi, > @@ -478,13 +485,29 @@ static irqreturn_t geni_spi_isr(int irq, void *data) > struct geni_se *se = &mas->se; > u32 m_irq; > unsigned long flags; > - > - if (mas->cur_mcmd == CMD_NONE) > - return IRQ_NONE; > + irqreturn_t ret = IRQ_HANDLED; > > spin_lock_irqsave(&mas->lock, flags); > m_irq = readl(se->base + SE_GENI_M_IRQ_STATUS); Does this read need to be inside the lock? > > + /* Check for spurious interrupt */ > + if (!m_irq) { > + ret = IRQ_NONE; > + goto exit; I ask because it could be simplified by reading the status and then immediately returning IRQ_NONE if no bits are set without having to do the lock/unlock dance. And does writing 0 to the irq clear register do anything? > + } > + > + /* > + * If we got a real interrupt but software state machine thinks > + * we were idle then give a warning. We'll do our best to clear > + * the interrupt but we'll still return IRQ_NONE. If this keeps > + * happening the system will eventually disable us. > + */ > + if (mas->cur_mcmd == CMD_NONE) { > + pr_warn("Unexpected SPI interrupt: %#010x\n", m_irq); > + ret = IRQ_NONE; > + goto exit; > + } > + > if ((m_irq & M_RX_FIFO_WATERMARK_EN) || (m_irq & M_RX_FIFO_LAST_EN)) > geni_spi_handle_rx(mas); > > @@ -523,9 +546,11 @@ static irqreturn_t geni_spi_isr(int irq, void *data) > complete(&mas->xfer_done); > } > > +exit: > writel(m_irq, se->base + SE_GENI_M_IRQ_CLEAR); > spin_unlock_irqrestore(&mas->lock, flags); > - return IRQ_HANDLED; > + > + return ret; > } > > static int spi_geni_probe(struct platform_device *pdev)