On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote: > > On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote: > > The interrupt line of PCI devices is interpreted as edge-triggered, > > however the interrupt signal of the m_can controller integrated in Intel > > Elkhart Lake CPUs appears to be generated level-triggered. > > > > Consider the following sequence of events: > > > > - IR register is read, interrupt X is set > > - A new interrupt Y is triggered in the m_can controller > > - IR register is written to acknowledge interrupt X. Y remains set in IR > > > > As at no point in this sequence no interrupt flag is set in IR, the > > m_can interrupt line will never become deasserted, and no edge will ever > > be observed to trigger another run of the ISR. This was observed to > > result in the TX queue of the EHL m_can to get stuck under high load, > > because frames were queued to the hardware in m_can_start_xmit(), but > > m_can_finish_tx() was never run to account for their successful > > transmission. > > > > To fix the issue, repeatedly read and acknowledge interrupts at the > > start of the ISR until no interrupt flags are set, so the next incoming > > interrupt will also result in an edge on the interrupt line. > > > > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") > > Signed-off-by: Matthias Schiffer <matthias.schiffer@xxxxxxxxxxxxxxx> > > Just a few comment nitpicks below. Otherwise: > > Reviewed-by: Markus Schneider-Pargmann <msp@xxxxxxxxxxxx> We have received a report that while this patch fixes a stuck queue issue reproducible with cangen, the problem has not disappeared with our customer's application. I will hold off sending a new version of the patch while we're investigating whether there is a separate issue with the same symptoms or the patch is insufficient. Patch 1/2 should be good to go and could be applied independently. Matthias > > > --- > > > > v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci > > v3: > > - rename flag to irq_edge_triggered > > - update comment to describe the issue more generically as one of systems with > > edge-triggered interrupt line. m_can_pci is mentioned as an example, as it > > is the only m_can variant that currently sets the irq_edge_triggered flag. > > > > drivers/net/can/m_can/m_can.c | 22 +++++++++++++++++----- > > drivers/net/can/m_can/m_can.h | 1 + > > drivers/net/can/m_can/m_can_pci.c | 1 + > > 3 files changed, 19 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c > > index c85ac1b15f723..24e348f677714 100644 > > --- a/drivers/net/can/m_can/m_can.c > > +++ b/drivers/net/can/m_can/m_can.c > > @@ -1207,20 +1207,32 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir) > > static int m_can_interrupt_handler(struct m_can_classdev *cdev) > > { > > struct net_device *dev = cdev->net; > > - u32 ir; > > + u32 ir = 0, ir_read; > > int ret; > > > > if (pm_runtime_suspended(cdev->dev)) > > return IRQ_NONE; > > > > - ir = m_can_read(cdev, M_CAN_IR); > > + /* The m_can controller signals its interrupt status as a level, but > > + * depending in the integration the CPU may interpret the signal as > ^ on? > > > + * edge-triggered (for example with m_can_pci). > > + * We must observe that IR is 0 at least once to be sure that the next > > As the loop has a break for non edge-triggered chips, I think you should > include that in the comment, like 'For these edge-triggered > integrations, we must observe...' or something similar. > > Best > Markus > > > + * interrupt will generate an edge. > > + */ > > + while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) { > > + ir |= ir_read; > > + > > + /* ACK all irqs */ > > + m_can_write(cdev, M_CAN_IR, ir); > > + > > + if (!cdev->irq_edge_triggered) > > + break; > > + } > > + > > m_can_coalescing_update(cdev, ir); > > if (!ir) > > return IRQ_NONE; > > > > - /* ACK all irqs */ > > - m_can_write(cdev, M_CAN_IR, ir); > > - > > if (cdev->ops->clear_interrupts) > > cdev->ops->clear_interrupts(cdev); > > > > diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h > > index 92b2bd8628e6b..ef39e8e527ab6 100644 > > --- a/drivers/net/can/m_can/m_can.h > > +++ b/drivers/net/can/m_can/m_can.h > > @@ -99,6 +99,7 @@ struct m_can_classdev { > > int pm_clock_support; > > int pm_wake_source; > > int is_peripheral; > > + bool irq_edge_triggered; > > > > // Cached M_CAN_IE register content > > u32 active_interrupts; > > diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c > > index d72fe771dfc7a..9ad7419f88f83 100644 > > --- a/drivers/net/can/m_can/m_can_pci.c > > +++ b/drivers/net/can/m_can/m_can_pci.c > > @@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) > > mcan_class->pm_clock_support = 1; > > mcan_class->pm_wake_source = 0; > > mcan_class->can.clock.freq = id->driver_data; > > + mcan_class->irq_edge_triggered = true; > > mcan_class->ops = &m_can_pci_ops; > > > > pci_set_drvdata(pci, mcan_class); > > -- > > TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany > > Amtsgericht München, HRB 105018 > > Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider > > https://www.tq-group.com/ -- TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany Amtsgericht München, HRB 105018 Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider https://www.tq-group.com/