Hi, I've bumped into what I think is a chip bug that the C_CAN/D_CAN driver isn't handling. It can get into a state where the chip status register reports it's bus off, but the can driver doesn't know, so the bus never gets restarted. Looks like the chip isn't firing the interrupt or is firing with the interrupt register as zero. Either is wrong and means "c_can_poll" is never called, and thus the driver never picks up the bus off. We are turning on/off the can device we are talking to, and we have to do this a lot to cause this. But we can get into this state and then the manual fix is to do "ifdown can0 && ifup can0" to sync up the driver and the chip. If you don't everything looks fine but nothing you send goes out to the bus and you never receive anything. When this issue bites, the last messages you see in candump are: can0 20000004 [8] 00 04 00 00 00 00 00 79 ERRORFRAME can0 20000004 [8] 00 10 00 00 00 00 00 79 ERRORFRAME You see this in candump on other iterations of the test, but often see the following : can0 20000040 [8] 00 00 00 00 00 00 00 00 ERRORFRAME can0 20000100 [8] 00 00 00 00 00 00 00 00 ERRORFRAME You obviously see a "c_can_platform 481cc000.can can0: bus-off" and "c_can_platform 481cc000.can can0: restarted" in dmesg with the above can messages. As I understand it, it's the BBB end that is sending these two. When you don't see these two following, there isn't a (lasting anyway) detected bus off, so the traffic between the device and the BBB starts as normal when power comes on. What I've done is catch the bus off in "c_can_start_xmit" on a "can_send" and if it is an unknown bus off, schedule "c_can_poll" which will do what is required. So it self fixes. I figured even if it's something odd about the device we are talking to causing this, it shouldn't be able to get into this state. This was on 4.4 but I see that 4.18 is basically the same code. Anyway, this is what we are doing and now I've done due diligence passing the information on. :-) Patch attached. Regards, Joe P.S. Don't know if "http://www.keil.com/dd/docs/datashts/silabs/boschcan_ug.pdf" is an acceptable link for the datasheet, but the URL for the datasheet in the code is 404'ed.
>From 71e43698938545a1eb58e42647e559c2e85d169b Mon Sep 17 00:00:00 2001 From: Joe Burmeister <joe.burmeister@xxxxxxxxxxxxx> Date: Wed, 20 Jun 2018 17:01:19 +0100 Subject: [PATCH] Handle missed interupt for bus off with can bus. --- drivers/net/can/c_can/c_can.c | 19 +++++++++++++++++++ drivers/net/can/c_can/c_can.h | 1 + 2 files changed, 20 insertions(+) diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c index 606b7d8ffe13..97959b1cbe81 100644 --- a/drivers/net/can/c_can/c_can.c +++ b/drivers/net/can/c_can/c_can.c @@ -461,6 +461,19 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb, struct can_frame *frame = (struct can_frame *)skb->data; struct c_can_priv *priv = netdev_priv(dev); u32 idx, obj; + u16 curr = priv->read_reg(priv, C_CAN_STS_REG); + + if (curr & STATUS_BOFF) { + if (priv->can.state != CAN_STATE_BUS_OFF) { + netdev_warn(dev, "Unexpected bus off, scheduling status check.\n"); + if (!priv->missed_status_change) { + priv->missed_status_change = 1; + napi_schedule(&priv->napi); + } + } + netif_stop_queue(dev); + return NETDEV_TX_BUSY; + } if (can_dropped_invalid_skb(dev, skb)) return NETDEV_TX_OK; @@ -628,6 +641,7 @@ static int c_can_start(struct net_device *dev) IF_COMM_RCV_LOW : IF_COMM_RCV_HIGH; priv->can.state = CAN_STATE_ERROR_ACTIVE; + priv->missed_status_change = 0; /* Attempt to use "active" if available else use "default" */ p = pinctrl_get_select(priv->device, "active"); @@ -1029,6 +1043,11 @@ static int c_can_poll(struct napi_struct *napi, int quota) u16 curr, last = priv->last_status; int work_done = 0; + if (priv->missed_status_change) { + netdev_warn(dev, "Processing missed status change.\n"); + priv->missed_status_change = 0; + } + priv->last_status = curr = priv->read_reg(priv, C_CAN_STS_REG); /* Ack status on C_CAN. D_CAN is self clearing */ if (priv->type != BOSCH_D_CAN) diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h index 8acdc7fa4792..dba34be1bd89 100644 --- a/drivers/net/can/c_can/c_can.h +++ b/drivers/net/can/c_can/c_can.h @@ -200,6 +200,7 @@ struct c_can_priv { atomic_t tx_active; unsigned long tx_dir; int last_status; + u16 missed_status_change; u16 (*read_reg) (const struct c_can_priv *priv, enum reg index); void (*write_reg) (const struct c_can_priv *priv, enum reg index, u16 val); u32 (*read_reg32) (const struct c_can_priv *priv, enum reg index); -- 2.17.1