Re: C_CAN/D_CAN bug and fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/06/18 10:07, Wolfgang Grandegger wrote:
> Hello Joe,
>
> Am 21.06.2018 um 10:57 schrieb Joe Burmeister:
>> Hi Wolfgang,
>>
>> On 21/06/18 09:25, Wolfgang Grandegger wrote:
>>> Hello Joe,
>>>
>>> Am 21.06.2018 um 09:55 schrieb Joe Burmeister:
>>>> Hi Wolfgang
>>>>
>>>>
>>>> On 21/06/18 08:24, Wolfgang Grandegger wrote:
>>>>> Hello Joe,
>>>>>
>>>>> I have some more questions...
>>>>>
>>>>> Am 20.06.2018 um 19:00 schrieb Joe Burmeister:
>>>>>> Hi,
>>>>>>
>>>>>> I've bumped into what I think is a chip bug that the C_CAN/D_CAN driver
>>>>>> isn't handling.
>>>>>>
>>>>>> It can get into a state where the chip status register reports it's bus
>>>>>> off, but the can driver doesn't know, so the bus never gets restarted.
>>>>>>
>>>>>> Looks like the chip isn't firing the interrupt or is firing with the
>>>>>> interrupt register as zero. Either is wrong and means "c_can_poll" is
>>>>>> never called, and thus the driver never picks up the bus off.
>>>>>>
>>>>>> We are turning on/off the can device we are talking to, and we have to
>>>>>> do this a lot to cause this. But we can get into this state and then the
>>>>> With on/off you mean "ifconfig up/down"?
>>>> No, literally power on and power off to the device we are talking to
>>>> over can.
>>>> It's power is controlled by a GPIO line on the BBB and part of the
>>>> normal operation is to turn it on and off.
>>>> But in the test, we do that a lot to reproduce this bug we only saw once
>>>> in a blue moon.
>>>>
>>>>> Is it always the first bus-off making trouble after you switched on the
>>>>> device?
>>>> No, even in the test, most of the time, the test iteration completes
>>>> without issue.
>>>>
>>>>> Does the "bus-off" condition occur frequently?
>>>> Even with the test, which an iteration lasts about 30 seconds, it can
>>>> take over 5 minutes.
>>> I mean: do bus-off conditions occur frequently on the bus? At what rate?
>> It's when the power is going on or off to the device. We have some
>> contactors to some big power that probably introduces a fair amount of
>> noise on connect/disconnect causing can errors. The device we are
>> talking to gets power from the same circuit. Though it's fine once up,
>> it is born and dies in a hell fire of noise. But CAN should be ok with that.
> So you have a bus error storm when the device is switched on (and off).
> I suspect that the problem is while initializing the CAN device.
>
> [...snip...]

The device is designed for this harsh environment (not by us), though it
too is still in development.
CAN should be pretty fault tolerant and it's the c_can/d_can driver that
has the issue. It's out of sync with it's chip's status.
When I get time on the setup, I'll see if that interrupt change stops
that happening.

[...snip...]


Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux