RE: [PATCH] can: c_can: Handle lost bus-off interrupt while IRQs are disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kurt,

yes that would be 3cb3eaac52c0f145d895f4b6c22834d5f02b8569.

I tried your patch as well, since the description matched the problem pretty well. However, I could still produce the hang on our board (based on AM3352) especially if the bus off occurs while the bus load is very high.

What I could see was the worker being triggered by the interrupt with the bus being in passive mode. When the worker then re-enabled the interrupt the hardware had already entered init mode due to the bus-off but nor would it trigger again nor would the interrupt register indicate a pending IRQ. The status register also indicates an error (LEC != 0x7).

A first attempt to fix/workaround this placed the check just before the worker re-enables the interrupts. But even this setup could reproduce the problem, although it took longer to do so.  So the timing around enabling the interrupts seems to matter here. Actually, adding a printout at the top of the worker where your patch checks the value of the sie_pending atomic seems to increase the chances of making it happen on my board, but that's just a feeling I got while cleaning up my patch.

Regards,

Felix

-----Original Message-----
From: Kurt Van Dijck <dev.kurt@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Thursday, March 26, 2020 2:40 PM

This patch looks very familiar.
I tried this 6 months ago, when I got a setup where I could reproduce this within minutes.

This patch didn't solve my problem, and I fixed my problem with reading the ISR register only once in the IRQ, and _not_ 2nd time in napi handler.
Reading the ISR in C_CAN is consuming the info, so unless the there is more state to clear (such as received message buffers), the interrupt cause is assumed to be known, and the C_CAN proceeds to the next interrupt cause.
I've not experienced a single problem since, where before the fix, I'd experience a few occurances every month.
We have an am3358 (beaglebone-like hardware).

I believe I submitted a patch for that.

Re-scheduling the napi is a workaround, IMHO.

Kurt

On do, 26 mrt 2020 11:43:18 +0100, Felix Riemann wrote:
> There are appears to be a race condition where interrupts caused by
> bus offs get lost if it occurs while interrupts are disabled or being
> re-enabled.
>
> This tries to avoid the deadlock by rescheduling the NAPI worker to
> handle the bus-off condition.
>
> Signed-off-by: Felix Riemann <felix.riemann@xxxxxx>
> Reviewed-by: Andre Kalb <andre.kalb@xxxxxx>
> ---
>  drivers/net/can/c_can/c_can.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/can/c_can/c_can.c
> b/drivers/net/can/c_can/c_can.c index 8e9f5620c9a2..0952ac0b9123
> 100644
> --- a/drivers/net/can/c_can/c_can.c
> +++ b/drivers/net/can/c_can/c_can.c
> @@ -1123,8 +1123,25 @@ static int c_can_poll(struct napi_struct *napi, int quota)
>  if (work_done < quota) {
>  napi_complete_done(napi, work_done);
>  /* enable all IRQs if we are not in bus off state */
> -if (priv->can.state != CAN_STATE_BUS_OFF)
> +if (priv->can.state != CAN_STATE_BUS_OFF) {
> +u32 ctrl;
>  c_can_irq_control(priv, true);
> +
> +/* There appears to be a race condition when the device
> + * enters bus off while interrupts are off or being
> + * re-enabled causing the bus off to get lost.
> + * This tries to avoid this condition.
> + */
> +ctrl = priv->read_reg(priv, C_CAN_CTRL_REG);
> +
> +if (ctrl & CONTROL_INIT) {
> +netdev_warn(dev, "lost bus off\n");
> +c_can_irq_control(priv, false);
> +/* Reschedule worker to handle bus off */
> +atomic_set(&priv->sie_pending, 1);
> +napi_reschedule(napi);
> +}
> +}
>  }
>
>  return work_done;
> --
> 2.26.0
>
___________________________________________________

SMA Solar Technology AG
Aufsichtsrat: Dr. Erik Ehrentraut (Vorsitzender)
Vorstand: Ulrich Hadding, Dr.-Ing. Juergen Reinert
Handelsregister: Amtsgericht Kassel HRB 3972
Sitz der Gesellschaft: 34266 Niestetal
USt-ID-Nr. DE 113 08 59 54
WEEE-Reg.-Nr. DE 95881150
___________________________________________________




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux