Hi Jacob, > Il 23/09/2022 21:21 Jacob Kroon <jacob.kroon@xxxxxxxxx> ha scritto: > > > On 9/23/22 21:03, Jacob Kroon wrote: > > Hi Dario, > > > > On 9/23/22 19:55, dariobin@xxxxxxxxx wrote: > >> Hi Michael, > >> > >>> Il 23/09/2022 13:36 Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> ha scritto: > >>> > >>> On 16.09.2022 06:14:58, Jacob Kroon wrote: > >>>> What I do know is that if I revert commit: > >>>> > >>>> "can: c_can: cache frames to operate as a true FIFO" > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=387da6bc7a826cc6d532b1c0002b7c7513238d5f > >>>> > >>>> then everything looks good. I don't get any BUG messages, and the host > >>>> has been running overnight without problems, so it seems to have fixed > >>>> the network interface lockup as well. > >>> > >>> Jacob, after this mail, I'll send 2 patches. One tries to disable the > >>> cache feature for C_CAN cores, the other shuts a potential race window. > >> > >> About the "can: c_can: don't cache TX messages for C_CAN cores" patch: > >> Could it make sense to change the c_can_start_xmit in this way > >> > >> - if (idx < c_can_get_tx_tail(tx_ring)) > >> - cmd &= ~IF_COMM_TXRQST; /* Cache the message */ > >> + if (idx < c_can_get_tx_tail(tx_ring)) { > >> + if (priv->type == BOSCH_D_CAN) { > >> + cmd &= ~IF_COMM_TXRQST; /* Cache the message */ > >> + } else { > >> + netif_stop_queue(priv->dev); > >> + return NETDEV_TX_BUSY; > >> + } > >> + } > >> > >> without changing the c_can_get_tx_{free,busy} routines ? > >> > > > > I can test, so you suggest only doing the above patch, or were there > > other parts from Marc's patch you wanted in ? > > > > Well I tried only the above, and the patch below > > diff --git a/drivers/net/can/c_can/c_can_main.c > b/drivers/net/can/c_can/c_can_main.c > index a7362af0babb..21d0a55ebbb3 100644 > --- a/drivers/net/can/c_can/c_can_main.c > +++ b/drivers/net/can/c_can/c_can_main.c > @@ -468,8 +468,14 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff > *skb, > if (c_can_get_tx_free(tx_ring) == 0) > netif_stop_queue(dev); > > - if (idx < c_can_get_tx_tail(tx_ring)) > - cmd &= ~IF_COMM_TXRQST; /* Cache the message */ > + if (idx < c_can_get_tx_tail(tx_ring)) { > + if (priv->type == BOSCH_D_CAN) { > + cmd &= ~IF_COMM_TXRQST; /* Cache the message */ > + } else { > + netif_stop_queue(priv->dev); > + return NETDEV_TX_BUSY; > + } > + } > > /* Store the message in the interface so we can call > * can_put_echo_skb(). We must do this before we enable > @@ -761,7 +767,7 @@ static void c_can_do_tx(struct net_device *dev) > > tail = c_can_get_tx_tail(tx_ring); > > - if (tail == 0) { > + if (priv->type == BOSCH_D_CAN && tail == 0) { > u8 head = c_can_get_tx_head(tx_ring); > > /* Start transmission for all cached messages */ > > > but they both seem to lockup the network. > I didn't understand if you applied two patches separately or not. This was the only patch I had thought of. Which was similar to Marc's for the interrupt part but differed in the c_can_start_xmit part. --- a/drivers/net/can/c_can/c_can_main.c +++ b/drivers/net/can/c_can/c_can_main.c @@ -464,13 +464,19 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb, return NETDEV_TX_BUSY; idx = c_can_get_tx_head(tx_ring); + if (idx < c_can_get_tx_tail(tx_ring)) { + if (priv->type == BOSCH_D_CAN) { + cmd &= ~IF_COMM_TXRQST; /* Cache the message */ + } else { + netif_stop_queue(priv->dev); + return NETDEV_TX_BUSY; + } + } + tx_ring->head++; if (c_can_get_tx_free(tx_ring) == 0) netif_stop_queue(dev); - if (idx < c_can_get_tx_tail(tx_ring)) - cmd &= ~IF_COMM_TXRQST; /* Cache the message */ - /* Store the message in the interface so we can call * can_put_echo_skb(). We must do this before we enable * transmit as we might race against do_tx(). @@ -723,7 +729,6 @@ static void c_can_do_tx(struct net_device *dev) struct c_can_tx_ring *tx_ring = &priv->tx; struct net_device_stats *stats = &dev->stats; u32 idx, obj, pkts = 0, bytes = 0, pend; - u8 tail; if (priv->msg_obj_tx_last > 32) pend = priv->read_reg32(priv, C_CAN_INTPND3_REG); @@ -759,15 +764,20 @@ static void c_can_do_tx(struct net_device *dev) stats->tx_bytes += bytes; stats->tx_packets += pkts; - tail = c_can_get_tx_tail(tx_ring); + if (priv->type == BOSCH_D_CAN) { + u8 tail; + + tail = c_can_get_tx_tail(tx_ring); - if (tail == 0) { - u8 head = c_can_get_tx_head(tx_ring); + if (tail == 0) { + u8 head = c_can_get_tx_head(tx_ring); - /* Start transmission for all cached messages */ - for (idx = tail; idx < head; idx++) { - obj = idx + priv->msg_obj_tx_first; - c_can_object_put(dev, IF_NAPI, obj, IF_COMM_TXRQST); + /* Start transmission for all cached messages */ + for (idx = tail; idx < head; idx++) { + obj = idx + priv->msg_obj_tx_first; + c_can_object_put(dev, IF_NAPI, obj, + IF_COMM_TXRQST); + } } } } I changed it a little from the previous email because I noticed a problem. Thanks and regards, Dario > Jacob