Re: [PATCH] can: c_can: Handle lost bus-off interrupt while IRQs are disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patch looks very familiar.
I tried this 6 months ago, when I got a setup where I could reproduce
this within minutes.

This patch didn't solve my problem, and I fixed my problem with reading the
ISR register only once in the IRQ, and _not_ 2nd time in napi handler.
Reading the ISR in C_CAN is consuming the info, so unless the there is
more state to clear (such as received message buffers), the interrupt
cause is assumed to be known, and the C_CAN proceeds to the next
interrupt cause.
I've not experienced a single problem since, where before the fix, I'd
experience a few occurances every month.
We have an am3358 (beaglebone-like hardware).

I believe I submitted a patch for that.

Re-scheduling the napi is a workaround, IMHO.

Kurt

On do, 26 mrt 2020 11:43:18 +0100, Felix Riemann wrote:
> There are appears to be a race condition where interrupts caused
> by bus offs get lost if it occurs while interrupts are disabled
> or being re-enabled.
> 
> This tries to avoid the deadlock by rescheduling the NAPI worker
> to handle the bus-off condition.
> 
> Signed-off-by: Felix Riemann <felix.riemann@xxxxxx>
> Reviewed-by: Andre Kalb <andre.kalb@xxxxxx>
> ---
>  drivers/net/can/c_can/c_can.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
> index 8e9f5620c9a2..0952ac0b9123 100644
> --- a/drivers/net/can/c_can/c_can.c
> +++ b/drivers/net/can/c_can/c_can.c
> @@ -1123,8 +1123,25 @@ static int c_can_poll(struct napi_struct *napi, int quota)
>  	if (work_done < quota) {
>  		napi_complete_done(napi, work_done);
>  		/* enable all IRQs if we are not in bus off state */
> -		if (priv->can.state != CAN_STATE_BUS_OFF)
> +		if (priv->can.state != CAN_STATE_BUS_OFF) {
> +			u32 ctrl;
>  			c_can_irq_control(priv, true);
> +
> +			/* There appears to be a race condition when the device
> +			 * enters bus off while interrupts are off or being
> +			 * re-enabled causing the bus off to get lost.
> +			 * This tries to avoid this condition.
> +			 */
> +			ctrl = priv->read_reg(priv, C_CAN_CTRL_REG);
> +
> +			if (ctrl & CONTROL_INIT) {
> +				netdev_warn(dev, "lost bus off\n");
> +				c_can_irq_control(priv, false);
> +				/* Reschedule worker to handle bus off */
> +				atomic_set(&priv->sie_pending, 1);
> +				napi_reschedule(napi);
> +			}
> +		}
>  	}
>  
>  	return work_done;
> -- 
> 2.26.0
> 



[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux