Hello Joe, I have some more questions... Am 20.06.2018 um 19:00 schrieb Joe Burmeister: > Hi, > > I've bumped into what I think is a chip bug that the C_CAN/D_CAN driver > isn't handling. > > It can get into a state where the chip status register reports it's bus > off, but the can driver doesn't know, so the bus never gets restarted. > > Looks like the chip isn't firing the interrupt or is firing with the > interrupt register as zero. Either is wrong and means "c_can_poll" is > never called, and thus the driver never picks up the bus off. > > We are turning on/off the can device we are talking to, and we have to > do this a lot to cause this. But we can get into this state and then the With on/off you mean "ifconfig up/down"? Is it always the first bus-off making trouble after you switched on the device? Does the "bus-off" condition occur frequently? May bus-off also occur during the start of the CAN device (ifconfig up)? > manual fix is to do "ifdown can0 && ifup can0" to sync up the driver and > the chip. If you don't everything looks fine but nothing you send goes > out to the bus and you never receive anything. > > When this issue bites, the last messages you see in candump are: > > can0 20000004 [8] 00 04 00 00 00 00 00 79 ERRORFRAME > can0 20000004 [8] 00 10 00 00 00 00 00 79 ERRORFRAME > > You see this in candump on other iterations of the test, but often see > the following : > > can0 20000040 [8] 00 00 00 00 00 00 00 00 ERRORFRAME > can0 20000100 [8] 00 00 00 00 00 00 00 00 ERRORFRAME > > You obviously see a "c_can_platform 481cc000.can can0: bus-off" and > "c_can_platform 481cc000.can can0: restarted" in dmesg with the above > can messages. As I understand it, it's the BBB end that is sending these > two. When you don't see these two following, there isn't a (lasting > anyway) detected bus off, so the traffic between the device and the BBB > starts as normal when power comes on. > > What I've done is catch the bus off in "c_can_start_xmit" on a > "can_send" and if it is an unknown bus off, schedule "c_can_poll" which > will do what is required. So it self fixes. > > I figured even if it's something odd about the device we are talking to > causing this, it shouldn't be able to get into this state. > > This was on 4.4 but I see that 4.18 is basically the same code. > > Anyway, this is what we are doing and now I've done due diligence > passing the information on. :-) > > Patch attached. OK, an extra napi_schedule() finds the bus-off then. Needs more thoughts... > > Regards, > > Joe > > P.S. Don't know if > "http://www.keil.com/dd/docs/datashts/silabs/boschcan_ug.pdf" is an > acceptable link for the datasheet, but the URL for the datasheet in the > code is 404'ed. I also realized some time ago that the link is broken :(. Your link to Keil looks good, though. Wolfgang. -- To unsubscribe from this list: send the line "unsubscribe linux-can" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html