Hello Joe, Am 21.06.2018 um 13:40 schrieb Wolfgang Grandegger: > Hello Joe, > > Am 21.06.2018 um 13:29 schrieb Joe Burmeister: >> Hi Wolfgang, >> >> On 21/06/18 11:28, Wolfgang Grandegger wrote: >> >> [...snip...] >>>>>> It's fairly rare to happen even in our unusually harsh environment, >>>>>> unless we really push it with unrealistically tight loop, so this >>>>>> definitely comes under crazy edge cases. ;-) >>>>> Your conditions are special... maybe that's why the problem did not show >>>>> up yet. >>>> We're sure of it. It only showed up now as we are soak testing to try >>>> and reproduce these rare bugs. >>>>>> I'm just waiting for my turn to try my interrupt idea on the hardware (I >>>>>> hogged it yesterday) and reading the datasheet. >>>>>> Even if that works, I'm still thinking it might be an idea to check for >>>>>> bus off in the send as a fall back with a "this should never happen" >>>>>> kind of warning. >>>>> BTW, what hardware (Board/CPU) are you using? >>>> It's a Beagle Bone Black, TI's AM335x. It's part of a stack of prototype >>>> boards we designed. >>> OK. How do you recover from bus-off? What "ip" command do you use to >>> configure the device? >> >> This is in a small Buildroot image and we're using ifup and ifdown: > > With ifup/ifconfig you cannot specify CAN bit timing etc. I mean, how > do you recover from bus-off? There are two options, manually or > automatically: > > - ip link set canX type can restart > > - ip link set canX up txqueuelen=1000 type can bitrate 250000 restart-ms 100 > > The latter will restart the device automatically after 100 ms. > >> https://ss64.com/bash/ifup.html >> >> It uses /etc/network/interfaces to know how to setup the CAN interfaces. >> >> I'm on the machine now and trying things. >> >> I can reduce the problem with: >> >> static irqreturn_t c_can_isr(int irq, void *dev_id) >> { >> struct net_device *dev = (struct net_device *)dev_id; >> struct c_can_priv *priv = netdev_priv(dev); >> >> if (!priv->read_reg(priv, C_CAN_INT_REG)) { >> // It is possible the interrupt register has been cleared >> if (priv->last_status == priv->read_reg(priv, C_CAN_STS_REG)) >> return IRQ_NONE; >> } >> >> /* disable all interrupts and schedule the NAPI */ >> c_can_irq_control(priv, false); >> napi_schedule(&priv->napi); >> >> return IRQ_HANDLED; >> } >> >> >> If I add a flag variable in "priv" and print out if it is set in >> "c_can_poll" I can see there are times that this is catching. >> But it's just failed so this isn't a complete solution. > > Is it a C_CAN or D_CAN controller? On D_CAN, this register is self Most probably it a D_CAN controller: { .compatible = "ti,am3352-d_can", .data = &am3352_dcan_drvdata }, > clearing. You need to be careful reading that register. I would add a "pr_info("%s: sts=%#x\n", __func__, curr);" right after "curr = priv->read_reg(priv, C_CAN_STS_REG);" in "c_can_poll)". In case of trouble you see: can0 20000004 [8] 00 04 00 00 00 00 00 79 ERRORFRAME can0 20000004 [8] 00 10 00 00 00 00 00 79 ERRORFRAME The TX error counter is the same for both messages. For error passive it should be higher, hmm. When the system hangs, what does the following command report: $ ip -d -s link show canX Wolfgang. -- To unsubscribe from this list: send the line "unsubscribe linux-can" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html