RE: [RFC PATCH] can: xilinx_can: fix chip_start failure with invalid bus

Appana Durga Kedareswara Rao <appanad@xxxxxxxxxx> · Mon, 18 Mar 2019 11:25:04 +0000

Hi Anssi Hannula,

> -----Original Message-----
> From: Anssi Hannula <anssi.hannula@xxxxxxxxxx>
> Sent: Monday, March 18, 2019 4:52 PM
> To: Appana Durga Kedareswara Rao <appanad@xxxxxxxxxx>; Michal Simek
> <michals@xxxxxxxxxx>; linux-can@xxxxxxxxxxxxxxx; Marc Kleine-Budde
> <mkl@xxxxxxxxxxxxxx>
> Cc: Wolfgang Grandegger <wg@xxxxxxxxxxxxxx>; Shubhrajyoti Datta
> <shubhraj@xxxxxxxxxx>; Naga Sureshkumar Relli <nagasure@xxxxxxxxxx>
> Subject: Re: [RFC PATCH] can: xilinx_can: fix chip_start failure with invalid bus
> 
> On 12.3.2019 13:27, Appana Durga Kedareswara Rao wrote:
> > Hi Anssi Hannula,
> >
> > Thanks for the patch...
> > Please find response inline.
> 
> Hi, and sorry for the delay in response.
> 
> > <Snip>
> >> On 07. 03. 19 13:52, Anssi Hannula wrote:
> >>> Currently the xilinx_can xcan_chip_start() function, called from
> >>> .ndo_open() and via CAN_MODE_START (bus-off restart), waits for the
> >>> SR register to show the wanted operating state, with a 1 sec timeout.
> >>>
> >>> However, that register bit will only be set once the HW has observed
> >>> 11 consecutive recessive bits (BusIdle) on the bus.
> >>>
> >>> If the bus will not see the 11 bits (e.g. it is stuck dominant), the
> >>> function will timeout and return an error.
> >>> If this was done as part of a scheduled restart from bus-off, the
> >>> interface will stay in bus-off state forever even if the bus
> >>> recovers later.
> >>> If this was done during interface .ndo_open(), the interface will
> >>> stay down.
> >>>
> >>> According to M_CAN and FLEXCAN documentation they also wait for 11
> >>> consecutive recessive bits, but their drivers do not seem to wait
> >>> for that.
> >>>
> >>> To make the behavior consistent, modify xilinx_can to also not wait
> >>> for the synchronization to complete.
> >>>
> >>> The only way for users to know for sure that the bus has been joined
> >>> successfully is to see successfully received or transmitted frames.
> >>> That does not seem optimal, but it is consistent with other drivers
> >>> and we should have a properly working restart-ms with xilinx_can.
> >>>
> >>> Tested on ZynqMP with Xilinx CAN-FD 1.0.
> >>>
> >>> Signed-off-by: Anssi Hannula <anssi.hannula@xxxxxxxxxx>
> >>> ---
> >>>
> >>> Not really sure if this way is the preferred way forward.
> >>>
> >>> One alternative way would be to keep the wait but log a warning and
> >>> allow the function to succeed?
> >>>
> >>> But I believe something needs to be done, as it seems a reasonable
> >>> expectation that if an interface is brought up with restart-ms, the
> >>> interface should recover to a working state even after periods of
> >>> bus being in whatever invalid state.
> >>>
> >>>
> >>> The HW waiting is described for Xilinx HW in [1] "Register
> >>> Configuration Sequence", for M_CAN in [2] 3.1.1 Software
> >>> Initialization, for flexcan in [3] 34.4.10.1 Freeze mode.
> >>>
> >>> [1]
> >>>
> >>
> https://www.xilinx.com/support/documentation/ip_documentation/canfd/v
> >> 2
> >>> _0/pg223-canfd.pdf [2]
> >>> http://www.bosch-
> >> semiconductors.com/media/ip_modules/pdf_2/m_can/mcan_
> >>> users_manual_v3212.pdf [3]
> >>> https://www.nxp.com/docs/en/reference-manual/MC56F8458XRM.pdf
> >>>
> >>>
> >>>  drivers/net/can/xilinx_can.c | 13 +------------
> >>>  1 file changed, 1 insertion(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/net/can/xilinx_can.c
> >>> b/drivers/net/can/xilinx_can.c index 97d0933d9bd9..13abc9e1a7ed
> >>> 100644
> >>> --- a/drivers/net/can/xilinx_can.c
> >>> +++ b/drivers/net/can/xilinx_can.c
> >>> @@ -400,9 +400,8 @@ static int xcan_set_bittiming(struct net_device
> >>> *ndev)  static int xcan_chip_start(struct net_device *ndev)  {
> >>>  	struct xcan_priv *priv = netdev_priv(ndev);
> >>> -	u32 reg_msr, reg_sr_mask;
> >>> +	u32 reg_msr;
> >>>  	int err;
> >>> -	unsigned long timeout;
> >>>  	u32 ier;
> >>>
> >>>  	/* Check if it is in reset mode */ @@ -428,10 +427,8 @@ static int
> >>> xcan_chip_start(struct net_device *ndev)
> >>>  	/* Check whether it is loopback mode or normal mode  */
> >>>  	if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK) {
> >>>  		reg_msr = XCAN_MSR_LBACK_MASK;
> >>> -		reg_sr_mask = XCAN_SR_LBACK_MASK;
> >>>  	} else {
> >>>  		reg_msr = 0x0;
> >>> -		reg_sr_mask = XCAN_SR_NORMAL_MASK;
> >>>  	}
> >>>
> >>>  	/* enable the first extended filter, if any, as cores with
> >>> extended @@ -443,14 +440,6 @@ static int xcan_chip_start(struct
> net_device *ndev)
> >>>  	priv->write_reg(priv, XCAN_MSR_OFFSET, reg_msr);
> >>>  	priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_CEN_MASK);
> >>>
> >>> -	timeout = jiffies + XCAN_TIMEOUT;
> >>> -	while (!(priv->read_reg(priv, XCAN_SR_OFFSET) & reg_sr_mask)) {
> >>> -		if (time_after(jiffies, timeout)) {
> >>> -			netdev_warn(ndev,
> >>> -				"timed out for correct mode\n");
> >>> -			return -ETIMEDOUT;
> >>> -		}
> >>> -	}
> > Instead of deleting the timeout loop we can check for the bus off state
> here(SR Register -- Error Status fields).
> > If the controller is in bus off state skip this timeout loop...
> 
> Unfortunately the SR and ESR registers read zero in this situation until the 11
> recessive bits have been seen, so that would not work.

Could you please provide the steps to reproduce this issue?? 

Regards,
Kedar.

> 
> Checked on CAN FD 1.0 on ZynqMP.
> 
> 
> > Regards,
> > Kedar.
> >
> >>>  	netdev_dbg(ndev, "status:#x%08x\n",
> >>>  			priv->read_reg(priv, XCAN_SR_OFFSET));
> >>>
> >>>
> >> Xilinx folks: Please take a look at this RFC.
> >>
> >> Thanks,
> >> Michal
> 
> 
> --
> Anssi Hannula / Bitwise Oy
> +358 503803997