Hello Joe, Am 21.06.2018 um 13:29 schrieb Joe Burmeister: > Hi Wolfgang, > > On 21/06/18 11:28, Wolfgang Grandegger wrote: > > [...snip...] >>>>> It's fairly rare to happen even in our unusually harsh environment, >>>>> unless we really push it with unrealistically tight loop, so this >>>>> definitely comes under crazy edge cases. ;-) >>>> Your conditions are special... maybe that's why the problem did not show >>>> up yet. >>> We're sure of it. It only showed up now as we are soak testing to try >>> and reproduce these rare bugs. >>>>> I'm just waiting for my turn to try my interrupt idea on the hardware (I >>>>> hogged it yesterday) and reading the datasheet. >>>>> Even if that works, I'm still thinking it might be an idea to check for >>>>> bus off in the send as a fall back with a "this should never happen" >>>>> kind of warning. >>>> BTW, what hardware (Board/CPU) are you using? >>> It's a Beagle Bone Black, TI's AM335x. It's part of a stack of prototype >>> boards we designed. >> OK. How do you recover from bus-off? What "ip" command do you use to >> configure the device? > > This is in a small Buildroot image and we're using ifup and ifdown: With ifup/ifconfig you cannot specify CAN bit timing etc. I mean, how do you recover from bus-off? There are two options, manually or automatically: - ip link set canX type can restart - ip link set canX up txqueuelen=1000 type can bitrate 250000 restart-ms 100 The latter will restart the device automatically after 100 ms. > https://ss64.com/bash/ifup.html > > It uses /etc/network/interfaces to know how to setup the CAN interfaces. > > I'm on the machine now and trying things. > > I can reduce the problem with: > > static irqreturn_t c_can_isr(int irq, void *dev_id) > { > struct net_device *dev = (struct net_device *)dev_id; > struct c_can_priv *priv = netdev_priv(dev); > > if (!priv->read_reg(priv, C_CAN_INT_REG)) { > // It is possible the interrupt register has been cleared > if (priv->last_status == priv->read_reg(priv, C_CAN_STS_REG)) > return IRQ_NONE; > } > > /* disable all interrupts and schedule the NAPI */ > c_can_irq_control(priv, false); > napi_schedule(&priv->napi); > > return IRQ_HANDLED; > } > > > If I add a flag variable in "priv" and print out if it is set in > "c_can_poll" I can see there are times that this is catching. > But it's just failed so this isn't a complete solution. Is it a C_CAN or D_CAN controller? On D_CAN, this register is self clearing. You need to be careful reading that register. Wolfgang. -- To unsubscribe from this list: send the line "unsubscribe linux-can" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html