Hi Marc and Dario,
On 9/16/22 06:14, Jacob Kroon wrote:
...> What I do know is that if I revert commit:
"can: c_can: cache frames to operate as a true FIFO"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=387da6bc7a826cc6d532b1c0002b7c7513238d5f
then everything looks good. I don't get any BUG messages, and the host
has been running overnight without problems, so it seems to have fixed
the network interface lockup as well.
I ran the kernel *with* the commit above, and also with the following patch:
diff --git a/drivers/net/can/c_can/c_can_main.c b/drivers/net/can/c_can/c_can_main.c
index 52671d1ea17d..4375dc70e21f 100644
--- a/drivers/net/can/c_can/c_can_main.c
+++ b/drivers/net/can/c_can/c_can_main.c
@@ -1,3 +1,4 @@
+#define DEBUG
/*
* CAN bus driver for Bosch C_CAN controller
*
@@ -469,8 +470,15 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
if (c_can_get_tx_free(tx_ring) == 0)
netif_stop_queue(dev);
- if (idx < c_can_get_tx_tail(tx_ring))
+ netdev_dbg(dev, "JAKR:%d:%d:%d:%d\n", idx,
+ c_can_get_tx_head(tx_ring),
+ c_can_get_tx_tail(tx_ring),
+ c_can_get_tx_free(tx_ring));
+
+ if (idx < c_can_get_tx_tail(tx_ring)) {
cmd &= ~IF_COMM_TXRQST; /* Cache the message */
+ netdev_dbg(dev, "JAKR:Caching messages\n");
+ }
/* Store the message in the interface so we can call
* can_put_echo_skb(). We must do this before we enable
and I've uploaded the entire log I could capture from /dev/kmsg, right
up to the hang, here:
https://pastebin.com/6hvAcPc9
What looks odd to me right from the start is that sometimes when idx
rolls over to 0, and *only* when it rolls over to 0, the CAN frame gets
cached because "idx < c_can_get_tx_tail(tx_ring)".
Is it possible there is some difference between c_can and d_can in how
the HW buffers are working, which breaks the driver on my particular HW
setup ?
Regards,
Jacob