Re: [PATCH net] can: sja1000: Always restart the Tx queue after an overrun

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.09.2023 11:30:16, Marc Kleine-Budde wrote:
> On 22.09.2023 17:47:27, Miquel Raynal wrote:
> > Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with
> > a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
> > CAN controller reception: the Rx buffer is only 5 messages long, so when
> > the bus loaded (eg. a message every 50us), overrun may easily
> > happen. Upon an overrun situation, due to a possible internal crosstalk
> > situation, the controller enters a frozen state which only can be
> > unlocked with a soft reset (experimentally). The solution was to offload
> > a call to sja1000_start() in a threaded handler. This needs to happen in
> > process context as this operation requires to sleep. sja1000_start()
> > basically enters "reset mode", performs a proper software reset and
> > returns back into "normal mode".
> > 
> > Since this fix was introduced, we no longer observe any stalls in
> > reception. However it was sporadically observed that the transmit path
> > would now freeze. Further investigation blamed the fix mentioned above,
> > and especially the reset operation. Reproducing the reset in a loop
> > helped identifying what could possibly go wrong. The sja1000 is a single
> > Tx queue device, which leverages the netdev helpers to process one Tx
> > message at a time. The logic is: the queue is stopped, the message sent
> > to the transceiver, once properly transmitted the controller sets a
> > status bit which triggers an interrupt, in the interrupt handler the
> > transmission status is checked and the queue woken up. Unfortunately, if
> > an overrun happens, we might perform the soft reset precisely between
> > the transmission of the buffer to the transceiver and the advent of the
> > transmission status bit. We would then stop the transmission operation
> > without re-enabling the queue, leading to all further transmissions to
> > be ignored.
> > 
> > The reset interrupt can only happen while the device is "open", and
> > after a reset we anyway want to resume normal operations, no matter if a
> > packet to transmit got dropped in the process, so we shall wake up the
> > queue. Restarting the device and waking-up the queue is exactly what
> > sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
> > the queue state, we must acquire a lock both in the reset handler and in
> > the transmit path to ensure serialization of both operations. As the
> > reset handler might still be called after the transmission of a frame to
> > the transceiver but before it actually gets transmitted, we must ensure
> > we don't leak the skb, so we free it (the behavior is consistent, no
> > matter if there was an skb on the stack or not).
> 
> Can you make use of netif_tx_disable() and netif_wake_queue() in
> sja1000_reset_interrupt() instead of the lock?

...or netif_tx_lock()/netif_tx_unlock().

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde          |
Embedded Linux                   | https://www.pengutronix.de |
Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux