On 21/06/18 10:07, Wolfgang Grandegger wrote: > Hello Joe, > > Am 21.06.2018 um 10:57 schrieb Joe Burmeister: >> Hi Wolfgang, >> >> On 21/06/18 09:25, Wolfgang Grandegger wrote: >>> Hello Joe, >>> >>> Am 21.06.2018 um 09:55 schrieb Joe Burmeister: >>>> Hi Wolfgang >>>> >>>> >>>> On 21/06/18 08:24, Wolfgang Grandegger wrote: >>>>> Hello Joe, >>>>> >>>>> I have some more questions... >>>>> >>>>> Am 20.06.2018 um 19:00 schrieb Joe Burmeister: >>>>>> Hi, >>>>>> >>>>>> I've bumped into what I think is a chip bug that the C_CAN/D_CAN driver >>>>>> isn't handling. >>>>>> >>>>>> It can get into a state where the chip status register reports it's bus >>>>>> off, but the can driver doesn't know, so the bus never gets restarted. >>>>>> >>>>>> Looks like the chip isn't firing the interrupt or is firing with the >>>>>> interrupt register as zero. Either is wrong and means "c_can_poll" is >>>>>> never called, and thus the driver never picks up the bus off. >>>>>> >>>>>> We are turning on/off the can device we are talking to, and we have to >>>>>> do this a lot to cause this. But we can get into this state and then the >>>>> With on/off you mean "ifconfig up/down"? >>>> No, literally power on and power off to the device we are talking to >>>> over can. >>>> It's power is controlled by a GPIO line on the BBB and part of the >>>> normal operation is to turn it on and off. >>>> But in the test, we do that a lot to reproduce this bug we only saw once >>>> in a blue moon. >>>> >>>>> Is it always the first bus-off making trouble after you switched on the >>>>> device? >>>> No, even in the test, most of the time, the test iteration completes >>>> without issue. >>>> >>>>> Does the "bus-off" condition occur frequently? >>>> Even with the test, which an iteration lasts about 30 seconds, it can >>>> take over 5 minutes. >>> I mean: do bus-off conditions occur frequently on the bus? At what rate? >> It's when the power is going on or off to the device. We have some >> contactors to some big power that probably introduces a fair amount of >> noise on connect/disconnect causing can errors. The device we are >> talking to gets power from the same circuit. Though it's fine once up, >> it is born and dies in a hell fire of noise. But CAN should be ok with that. > So you have a bus error storm when the device is switched on (and off). > I suspect that the problem is while initializing the CAN device. > > [...snip...] The device is designed for this harsh environment (not by us), though it too is still in development. CAN should be pretty fault tolerant and it's the c_can/d_can driver that has the issue. It's out of sync with it's chip's status. When I get time on the setup, I'll see if that interrupt change stops that happening. [...snip...] Joe -- To unsubscribe from this list: send the line "unsubscribe linux-can" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html