Re: C_CAN/D_CAN bug and fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Joe,

Am 21.06.2018 um 13:40 schrieb Wolfgang Grandegger:
> Hello Joe,
> 
> Am 21.06.2018 um 13:29 schrieb Joe Burmeister:
>> Hi Wolfgang,
>>
>> On 21/06/18 11:28, Wolfgang Grandegger wrote:
>>
>> [...snip...]
>>>>>> It's fairly rare to happen even in our unusually harsh environment,
>>>>>> unless we really push it with unrealistically tight loop, so this
>>>>>> definitely comes under crazy edge cases. ;-)
>>>>> Your conditions are special... maybe that's why the problem did not show
>>>>> up yet.
>>>> We're sure of it. It only showed up now as we are soak testing to try
>>>> and reproduce these rare bugs.
>>>>>> I'm just waiting for my turn to try my interrupt idea on the hardware (I
>>>>>> hogged it yesterday) and reading the datasheet.
>>>>>> Even if that works, I'm still thinking it might be an idea to check for
>>>>>> bus off in the send as a fall back with a "this should never happen"
>>>>>> kind of warning.
>>>>> BTW, what hardware (Board/CPU) are you using?
>>>> It's a Beagle Bone Black, TI's AM335x. It's part of a stack of prototype
>>>> boards we designed.
>>> OK. How do you recover from bus-off? What "ip" command do you use to
>>> configure the device?
>>
>> This is in a small Buildroot image and we're using ifup and ifdown:
> 
> With ifup/ifconfig you cannot specify CAN bit timing etc. I mean, how
> do you recover from bus-off? There are two options, manually or
> automatically:
> 
> - ip link set canX type can restart
> 
> - ip link set canX up txqueuelen=1000 type can bitrate 250000 restart-ms 100
> 
> The latter will restart the device automatically after 100 ms.
> 
>> https://ss64.com/bash/ifup.html
>>
>> It uses /etc/network/interfaces to know how to setup the CAN interfaces.
>>
>> I'm on the machine now and trying things.
>>
>> I can reduce the problem with:
>>
>> static irqreturn_t c_can_isr(int irq, void *dev_id)
>> {
>>     struct net_device *dev = (struct net_device *)dev_id;
>>     struct c_can_priv *priv = netdev_priv(dev);
>>
>>     if (!priv->read_reg(priv, C_CAN_INT_REG)) {
>>         // It is possible the interrupt register has been cleared
>>         if (priv->last_status == priv->read_reg(priv, C_CAN_STS_REG))
>>             return IRQ_NONE;
>>     }
>>
>>     /* disable all interrupts and schedule the NAPI */
>>     c_can_irq_control(priv, false);
>>     napi_schedule(&priv->napi);
>>
>>     return IRQ_HANDLED;
>> }
>>
>>
>> If I add a flag variable in "priv" and print out if it is set in
>> "c_can_poll" I can see there are times that this is catching.
>> But it's just failed so this isn't a complete solution.
> 
> Is it a C_CAN or D_CAN controller? On D_CAN, this register is self 

Most probably it a D_CAN controller:

  { .compatible = "ti,am3352-d_can", .data = &am3352_dcan_drvdata },

> clearing. You need to be careful reading that register.

I would add a "pr_info("%s: sts=%#x\n", __func__, curr);" right after
"curr = priv->read_reg(priv, C_CAN_STS_REG);" in "c_can_poll)".

In case of trouble you see:

  can0  20000004   [8]  00 04 00 00 00 00 00 79   ERRORFRAME
  can0  20000004   [8]  00 10 00 00 00 00 00 79   ERRORFRAME

The TX error counter is the same for both messages. For error passive it
should be higher, hmm. When the system hangs, what does the following
command report:

  $ ip -d -s link show canX

Wolfgang.


--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux