Hello Joe, Am 21.06.2018 um 11:21 schrieb Joe Burmeister: > On 21/06/18 10:07, Wolfgang Grandegger wrote: >> Hello Joe, >> >> Am 21.06.2018 um 10:57 schrieb Joe Burmeister: >>> Hi Wolfgang, >>> >>> On 21/06/18 09:25, Wolfgang Grandegger wrote: >>>> Hello Joe, >>>> >>>> Am 21.06.2018 um 09:55 schrieb Joe Burmeister: >>>>> Hi Wolfgang >>>>> >>>>> >>>>> On 21/06/18 08:24, Wolfgang Grandegger wrote: >>>>>> Hello Joe, >>>>>> >>>>>> I have some more questions... >>>>>> >>>>>> Am 20.06.2018 um 19:00 schrieb Joe Burmeister: >>>>>>> Hi, >>>>>>> >>>>>>> I've bumped into what I think is a chip bug that the C_CAN/D_CAN driver >>>>>>> isn't handling. >>>>>>> >>>>>>> It can get into a state where the chip status register reports it's bus >>>>>>> off, but the can driver doesn't know, so the bus never gets restarted. >>>>>>> >>>>>>> Looks like the chip isn't firing the interrupt or is firing with the >>>>>>> interrupt register as zero. Either is wrong and means "c_can_poll" is >>>>>>> never called, and thus the driver never picks up the bus off. >>>>>>> >>>>>>> We are turning on/off the can device we are talking to, and we have to >>>>>>> do this a lot to cause this. But we can get into this state and then the >>>>>> With on/off you mean "ifconfig up/down"? >>>>> No, literally power on and power off to the device we are talking to >>>>> over can. >>>>> It's power is controlled by a GPIO line on the BBB and part of the >>>>> normal operation is to turn it on and off. >>>>> But in the test, we do that a lot to reproduce this bug we only saw once >>>>> in a blue moon. >>>>> >>>>>> Is it always the first bus-off making trouble after you switched on the >>>>>> device? >>>>> No, even in the test, most of the time, the test iteration completes >>>>> without issue. >>>>> >>>>>> Does the "bus-off" condition occur frequently? >>>>> Even with the test, which an iteration lasts about 30 seconds, it can >>>>> take over 5 minutes. >>>> I mean: do bus-off conditions occur frequently on the bus? At what rate? >>> It's when the power is going on or off to the device. We have some >>> contactors to some big power that probably introduces a fair amount of >>> noise on connect/disconnect causing can errors. The device we are >>> talking to gets power from the same circuit. Though it's fine once up, >>> it is born and dies in a hell fire of noise. But CAN should be ok with that. >> So you have a bus error storm when the device is switched on (and off). >> I suspect that the problem is while initializing the CAN device. >> >> [...snip...] > > The device is designed for this harsh environment (not by us), though it > too is still in development. > CAN should be pretty fault tolerant and it's the c_can/d_can driver that > has the issue. It's out of sync with it's chip's status. > When I get time on the setup, I'll see if that interrupt change stops > that happening. I didn't say that it's your fault ;). I just want to understand what could cause the problem! I don't think it's hardware, either. Wolfgang. -- To unsubscribe from this list: send the line "unsubscribe linux-can" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html