On Wed, 2024-05-15 at 13:12 +0200, Marc Kleine-Budde wrote: > On 14.05.2024 15:34:01, Vitor Soares wrote: > > > > +void mcp251xfd_tx_obj_write_sync(struct work_struct *ws) > > > > +{ > > > > + struct mcp251xfd_priv *priv = container_of(ws, struct > > > > mcp251xfd_priv, > > > > + tx_work); > > > > + struct mcp251xfd_tx_obj *tx_obj = priv->tx_work_obj; > > > > + struct mcp251xfd_tx_ring *tx_ring = priv->tx; > > > > + int err; > > > > + > > > > + err = spi_sync(priv->spi, &tx_obj->msg); > > > > + if (err) > > > > + mcp251xfd_tx_failure_drop(priv, tx_ring, err); > > > > + > > > > + priv->tx_work_obj = NULL; > > > > > > Race condition: > > > - after spi_sync() the CAN frame is send > > > - after the TX complete IRQ the TX queue is restarted > > > - the xmit handler might get BUSY > > > - fill the tx_work_obj again > > You can avoid the race condition by moving "priv->tx_work_obj = NULL;" > in front of the "spi_sync();". Right? > > > > > +} > > > > + > > > > static int mcp251xfd_tx_obj_write(const struct mcp251xfd_priv *priv, > > > > struct mcp251xfd_tx_obj *tx_obj) > > > > { > > > > @@ -175,7 +210,7 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff > > > > *skb, > > > > if (can_dev_dropped_skb(ndev, skb)) > > > > return NETDEV_TX_OK; > > > > > > > > - if (mcp251xfd_tx_busy(priv, tx_ring)) > > > > + if (mcp251xfd_tx_busy(priv, tx_ring) || priv->tx_work_obj) > > > > > > This should not happen, but better save than sorry. > > > > As there is the race condition you mentioned above, on this condition: > > priv->tx_work_obj = tx_obj --> xmit will return NETDEV_TX_BUSY > > > > or > > > > priv->tx_work_obj = NULL --> It goes through the rest of the code or > > the workqueue may sleep after setting tx_work_obj to NULL. Should I > > use work_busy() here instead or do you have another suggestion? > > Yes, introduce mcp251xfd_work_busy(). > I'll implement it. > I'm not sure what happens if the xmit is called between the > "priv->tx_work_obj = NULL" and the end of the work. Will queue_work() > return false, as the queue is still running? >From the test I did so far, my understanding is the following: If mcp251xfd_tx_obj_write doesn't fail, everything is OK. if mcp251xfd_tx_obj_write fails with EBUSY - stop netif queue - fill the tx_work_obj - start worker queue_work() doesn't return false even when work_busy() = true. - xmit handler return, and wait netif_wake_queue() - the work handler waits until the previous job gets done before starting the next one. - after the TX completes IRQ, the TX queue is restarted If the TX queue is restarted immediately after queue_work(), tx_work_obj is filled, making the xmit handler return NETDEV_TX_BUSY. The tests were done with a delay after priv->tx_work_obj = NULL. Best regards, Vitor Soares