On 27.09.2023 18:44:42, Miquel Raynal wrote: > Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with > a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000 > CAN controller reception: the Rx buffer is only 5 messages long, so when > the bus loaded (eg. a message every 50us), overrun may easily > happen. Upon an overrun situation, due to a possible internal crosstalk > situation, the controller enters a frozen state which only can be > unlocked with a soft reset (experimentally). The solution was to offload > a call to sja1000_start() in a threaded handler. This needs to happen in > process context as this operation requires to sleep. sja1000_start() > basically enters "reset mode", performs a proper software reset and > returns back into "normal mode". > > Since this fix was introduced, we no longer observe any stalls in > reception. However it was sporadically observed that the transmit path > would now freeze. Further investigation blamed the fix mentioned above, > and especially the reset operation. Reproducing the reset in a loop > helped identifying what could possibly go wrong. The sja1000 is a single > Tx queue device, which leverages the netdev helpers to process one Tx > message at a time. The logic is: the queue is stopped, the message sent > to the transceiver, once properly transmitted the controller sets a > status bit which triggers an interrupt, in the interrupt handler the > transmission status is checked and the queue woken up. Unfortunately, if > an overrun happens, we might perform the soft reset precisely between > the transmission of the buffer to the transceiver and the advent of the > transmission status bit. We would then stop the transmission operation > without re-enabling the queue, leading to all further transmissions to > be ignored. > > The reset interrupt can only happen while the device is "open", and > after a reset we anyway want to resume normal operations, no matter if a > packet to transmit got dropped in the process, so we shall wake up the > queue. Restarting the device and waking-up the queue is exactly what > sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about > the queue state, we must acquire a lock both in the reset handler and in > the transmit path to ensure serialization of both operations. As the > reset handler might still be called after the transmission of a frame to > the transceiver but before it actually gets transmitted, we must ensure > we don't leak the skb, so we free it (the behavior is consistent, no > matter if there was an skb on the stack or not). > > Fixes: 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > --- > > Changes in v2: > * As Marc sugested, use netif_tx_{,un}lock() instead of our own > spin_lock. > > drivers/net/can/sja1000/sja1000.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c > index ae47fc72aa96..91e3fb3eed20 100644 > --- a/drivers/net/can/sja1000/sja1000.c > +++ b/drivers/net/can/sja1000/sja1000.c > @@ -297,6 +297,7 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb, > if (can_dropped_invalid_skb(dev, skb)) > return NETDEV_TX_OK; > > + netif_tx_lock(dev); > netif_stop_queue(dev); > > fi = dlc = cf->can_dlc; > @@ -335,6 +336,8 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb, > > sja1000_write_cmdreg(priv, cmd_reg_val); > > + netif_tx_unlock(dev); > + I think netif_tx_lock() should be used in a different way. As far as I understand it, you should call it only in the sja1000_reset_interrupt(), where you want to tx path to interfere. Please test the new code with lockdep enabled. Marc > return NETDEV_TX_OK; > } > > @@ -396,7 +399,13 @@ static irqreturn_t sja1000_reset_interrupt(int irq, void *dev_id) > struct net_device *dev = (struct net_device *)dev_id; > > netdev_dbg(dev, "performing a soft reset upon overrun\n"); > - sja1000_start(dev); > + > + netif_tx_lock(dev); > + > + can_free_echo_skb(dev, 0); > + sja1000_set_mode(dev, CAN_MODE_START); > + > + netif_tx_unlock(dev); > > return IRQ_HANDLED; > } > -- > 2.34.1 > > -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung Nürnberg | Phone: +49-5121-206917-129 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
Attachment:
signature.asc
Description: PGP signature