Re: [PATCH v2] can: sja1000: Always restart the Tx queue after an overrun

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.09.2023 18:44:42, Miquel Raynal wrote:
> Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with
> a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
> CAN controller reception: the Rx buffer is only 5 messages long, so when
> the bus loaded (eg. a message every 50us), overrun may easily
> happen. Upon an overrun situation, due to a possible internal crosstalk
> situation, the controller enters a frozen state which only can be
> unlocked with a soft reset (experimentally). The solution was to offload
> a call to sja1000_start() in a threaded handler. This needs to happen in
> process context as this operation requires to sleep. sja1000_start()
> basically enters "reset mode", performs a proper software reset and
> returns back into "normal mode".
> 
> Since this fix was introduced, we no longer observe any stalls in
> reception. However it was sporadically observed that the transmit path
> would now freeze. Further investigation blamed the fix mentioned above,
> and especially the reset operation. Reproducing the reset in a loop
> helped identifying what could possibly go wrong. The sja1000 is a single
> Tx queue device, which leverages the netdev helpers to process one Tx
> message at a time. The logic is: the queue is stopped, the message sent
> to the transceiver, once properly transmitted the controller sets a
> status bit which triggers an interrupt, in the interrupt handler the
> transmission status is checked and the queue woken up. Unfortunately, if
> an overrun happens, we might perform the soft reset precisely between
> the transmission of the buffer to the transceiver and the advent of the
> transmission status bit. We would then stop the transmission operation
> without re-enabling the queue, leading to all further transmissions to
> be ignored.
> 
> The reset interrupt can only happen while the device is "open", and
> after a reset we anyway want to resume normal operations, no matter if a
> packet to transmit got dropped in the process, so we shall wake up the
> queue. Restarting the device and waking-up the queue is exactly what
> sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
> the queue state, we must acquire a lock both in the reset handler and in
> the transmit path to ensure serialization of both operations. As the
> reset handler might still be called after the transmission of a frame to
> the transceiver but before it actually gets transmitted, we must ensure
> we don't leak the skb, so we free it (the behavior is consistent, no
> matter if there was an skb on the stack or not).
> 
> Fixes: 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx>
> ---
> 
> Changes in v2:
> * As Marc sugested, use netif_tx_{,un}lock() instead of our own
>   spin_lock.
> 
>  drivers/net/can/sja1000/sja1000.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
> index ae47fc72aa96..91e3fb3eed20 100644
> --- a/drivers/net/can/sja1000/sja1000.c
> +++ b/drivers/net/can/sja1000/sja1000.c
> @@ -297,6 +297,7 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
>  	if (can_dropped_invalid_skb(dev, skb))
>  		return NETDEV_TX_OK;
>  
> +	netif_tx_lock(dev);
>  	netif_stop_queue(dev);
>  
>  	fi = dlc = cf->can_dlc;
> @@ -335,6 +336,8 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
>  
>  	sja1000_write_cmdreg(priv, cmd_reg_val);
>  
> +	netif_tx_unlock(dev);
> +

I think netif_tx_lock() should be used in a different way. As far as I
understand it, you should call it only in the sja1000_reset_interrupt(),
where you want to tx path to interfere.

Please test the new code with lockdep enabled.

Marc

>  	return NETDEV_TX_OK;
>  }
>  
> @@ -396,7 +399,13 @@ static irqreturn_t sja1000_reset_interrupt(int irq, void *dev_id)
>  	struct net_device *dev = (struct net_device *)dev_id;
>  
>  	netdev_dbg(dev, "performing a soft reset upon overrun\n");
> -	sja1000_start(dev);
> +
> +	netif_tx_lock(dev);
> +
> +	can_free_echo_skb(dev, 0);
> +	sja1000_set_mode(dev, CAN_MODE_START);
> +
> +	netif_tx_unlock(dev);
>  
>  	return IRQ_HANDLED;
>  }
> -- 
> 2.34.1
> 
> 

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux