C_CAN/D_CAN bug and fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've bumped into what I think is a chip bug that the C_CAN/D_CAN driver
isn't handling.

It can get into a state where the chip status register reports it's bus
off, but the can driver doesn't know, so the bus never gets restarted.

Looks like the chip isn't firing the interrupt or is firing with the
interrupt register as zero. Either is wrong and means "c_can_poll" is
never called, and thus the driver never picks up the bus off.

We are turning on/off the can device we are talking to, and we have to
do this a lot to cause this. But we can get into this state and then the
manual fix is to do "ifdown can0 && ifup can0" to sync up the driver and
the chip. If you don't everything looks fine but nothing you send goes
out to the bus and you never receive anything.

When this issue bites, the last messages you see in candump are:

can0  20000004   [8]  00 04 00 00 00 00 00 79   ERRORFRAME
can0  20000004   [8]  00 10 00 00 00 00 00 79   ERRORFRAME

You see this in candump on other iterations of the test, but often see
the following :

can0  20000040   [8]  00 00 00 00 00 00 00 00   ERRORFRAME
can0  20000100   [8]  00 00 00 00 00 00 00 00   ERRORFRAME

You obviously see a "c_can_platform 481cc000.can can0: bus-off" and
"c_can_platform 481cc000.can can0: restarted" in dmesg with the above
can messages. As I understand it, it's the BBB end that is sending these
two. When you don't see these two following, there isn't a (lasting
anyway) detected bus off, so the traffic between the device and the BBB
starts as normal when power comes on.

What I've done is catch the bus off in "c_can_start_xmit" on a
"can_send" and if it is an unknown bus off, schedule "c_can_poll" which
will do what is required. So it self fixes.

I figured even if it's something odd about the device we are talking to
causing this, it shouldn't be able to get into this state.

This was on 4.4 but I see that 4.18 is basically the same code.

Anyway, this is what we are doing and now I've done due diligence
passing the information on. :-)

Patch attached.

Regards,

Joe

P.S. Don't know if
"http://www.keil.com/dd/docs/datashts/silabs/boschcan_ug.pdf"; is an
acceptable link for the datasheet, but the URL for the datasheet in the
code is 404'ed.
>From 71e43698938545a1eb58e42647e559c2e85d169b Mon Sep 17 00:00:00 2001
From: Joe Burmeister <joe.burmeister@xxxxxxxxxxxxx>
Date: Wed, 20 Jun 2018 17:01:19 +0100
Subject: [PATCH] Handle missed interupt for bus off with can bus.

---
 drivers/net/can/c_can/c_can.c | 19 +++++++++++++++++++
 drivers/net/can/c_can/c_can.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 606b7d8ffe13..97959b1cbe81 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -461,6 +461,19 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	struct can_frame *frame = (struct can_frame *)skb->data;
 	struct c_can_priv *priv = netdev_priv(dev);
 	u32 idx, obj;
+	u16 curr = priv->read_reg(priv, C_CAN_STS_REG);
+
+	if (curr & STATUS_BOFF) {
+		if (priv->can.state != CAN_STATE_BUS_OFF) {
+			netdev_warn(dev, "Unexpected bus off, scheduling status check.\n");
+			if (!priv->missed_status_change) {
+				priv->missed_status_change = 1;
+				napi_schedule(&priv->napi);
+			}
+		}
+		netif_stop_queue(dev);
+		return NETDEV_TX_BUSY;
+	}
 
 	if (can_dropped_invalid_skb(dev, skb))
 		return NETDEV_TX_OK;
@@ -628,6 +641,7 @@ static int c_can_start(struct net_device *dev)
 		IF_COMM_RCV_LOW : IF_COMM_RCV_HIGH;
 
 	priv->can.state = CAN_STATE_ERROR_ACTIVE;
+	priv->missed_status_change = 0;
 
 	/* Attempt to use "active" if available else use "default" */
 	p = pinctrl_get_select(priv->device, "active");
@@ -1029,6 +1043,11 @@ static int c_can_poll(struct napi_struct *napi, int quota)
 	u16 curr, last = priv->last_status;
 	int work_done = 0;
 
+	if (priv->missed_status_change) {
+		netdev_warn(dev, "Processing missed status change.\n");
+		priv->missed_status_change = 0;
+	}
+
 	priv->last_status = curr = priv->read_reg(priv, C_CAN_STS_REG);
 	/* Ack status on C_CAN. D_CAN is self clearing */
 	if (priv->type != BOSCH_D_CAN)
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 8acdc7fa4792..dba34be1bd89 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -200,6 +200,7 @@ struct c_can_priv {
 	atomic_t tx_active;
 	unsigned long tx_dir;
 	int last_status;
+	u16 missed_status_change;
 	u16 (*read_reg) (const struct c_can_priv *priv, enum reg index);
 	void (*write_reg) (const struct c_can_priv *priv, enum reg index, u16 val);
 	u32 (*read_reg32) (const struct c_can_priv *priv, enum reg index);
-- 
2.17.1


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux